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(57) The present invention is directed to an isolated 
DNA sequence coding for an enzyme involved in the 
mevalonate pathway or the pathway from isopentenyl 
pyrophosphate to famesyl pyrophosphate, vectors or 
plasmids comprising such DNA, hosts transformed by 
either such DMAs or vectors or plasmids and a process 
for the production of isoprenoids and carotenoids by 
using such transformed host cells. 



2 

CO 

to 

CO 

in 



EP0 955 363 A2 



Description 

[0001 ] The present invention relates to molecular biology for the manulacture of isoprendds and biological materials 
useful therefor. 

5 [0002] Astaxanthin Is known to distribute rn a wide variety of organisms such as animal (e.g. birds such as flamingo 
and scarlet ibis, and fish such as rainbow trout and salmon), algae and microorganisms. It is also recognized that astax- 
anthin has a strong antioxidation property against oxygen radical, which is expected to apply to pharmaceutical usage 
to protect living cells against some diseases such as a cancer. Moreover, from a viewpoint of industrial application, a 
demand for astaxanthin as a coloring reagent is increasing especially In the industry of farmed fish, such as salmon, 

10 because astaxanthin Imparts distinctive orange-red coloration to the animals and contributes to consumer appeal In the 
marketplace. 

[0003] Phaffia rhodozyma is known as a carotenogenic yeast strain which produces astaxanthin specifically Different 
from the other carotenogenic yeast, Rhodotorula species, Phaffia rhodozyma {P. rhodozyma) can ferment some sugars 
such as D-glucose. This is an important feature from a viewpoint of industrial application. In a recent taxonomic study. 

IS a sexual cycle of P. rhodozyma was revealed and its telemorphic state was designated under the name of Xanthophyf- 
lomyces dendrorhous (W. I. Golubev; Yeast 1 1 . 1 01 - 1 1 0, 1 995). Some strain improvement studies to obtain hyper pro- 
ducers of astaxanthin from R rhodozyma have been conducted, but such efforts have been restricted to employ the 
method of conventional mutagenesis and protoplast fusion in this decade. Recently, Wery et aL developed a host vector 
system using R rhodozyma in which a non-replicable plasmid was used to be integrated onto the genome of P. 

20 rhodozyma at the locus of ribosomal DNA in multicopies (Wery et aL, Gene, 184, 89-97. 1997). And Verdoes et aL 
reported more improved vectors to obtain a transformant of R rhodozyma as well as its three carotenogenic genes 
which code the enzymes that catalyzes the reactions from geranylgeranyl pyrophosphate to p-carotene (Internatbnal 
patent W097/23633). The importance of genetic engineering method on the strain improvement study of R rhodozyma 
will increase in near future to break through the reached productivity by the conventional methods. 

25 [0004] It is reported that the carotenogenic pathway from a general metabolite, acetyl-CoA consists of multiple enzy- 
matic steps in carotenogenic eukaryotes as shown in Fig.1 . Two molecules of acetyl-CoA are condensed to yield ace- 
toacetyl-CoA which is converted to 3-hydroxy-3-methyglutaryl-CoA (HMG-CoA) by the action of 3-hydroxymethyl-3- 
glutaryl-CoA synthase. Next, 3-hydroxy-3-methylglutaryl-CoA reductase converts HMG-CoA to mevalonate, to which 
two molecules of phosphate residues are then added by the action of two kinases (mevalonate kinase and phosphom- 

30 evalonate kinase). Mevalonate pyrophosphate is then decarboxylated by the action of mevalonate pyrophosphate 
decarboxylase to yield isopentenyl pyrophosphate (IPP) which becomes a building unit of wide varieties of isoprene 
molecules which is necessary in living organisms. This pathway Is called as mevalonate pathway taken from its Impor- 
tant intermediate, mevalonate. IPP is isomerized to dimethylaryl pyrophosphate (DMAPP) by the action of IPP Isomer- 
ase. Then, IPP and DMAPP converted to Cio unit, geranyl pyrophosphate (GPP) by the head to tail condensation. In a 

35 Similar condensation reaction between GPP and IPP, GPP is converted to C15 unit, farnesyl pyrophosphate (FPP) 
which is an important substrate of cholesterol in animal and ergosterol in yeast, and of farnesylation of regulation pro- 
tein such as RAS protein. In general, the biosynthesis of GPP and FPP from IPP and DMAPP are catalyzed by one 
enzyme called FPP synthase (Laskovics etaL, Biochemistry, 20, 1893-1901. 1981). On the other hand, in prokaryotes 
such as eubacteria, isopentenyl pyrophosphate was synthesized in a different pathway via 1-deoxyxylulose-5-phos- 

40 phate from pyruvate which is absent in yeast and animal (Rohmer et aL , Biochem. J. , 295. 51 7-524. 1 993). In exclusive 
studies of cholesterol biosynthesis, it was shown that rate-limiting steps of cholesterol metabolism were in the steps of 
this mevalonate pathway, especially in its early steps catalyzed by HMG-CoA synthase and HMG-CoA reductase. The 
inventors paid their attention to the fact that the biosynthetic pathways of cholesterol and carotenoid share their inter- 
mediate pathway from acetyl-CoA to FPP, and tried to improve the rate-limiting steps in the carotenogenic pathway 

45 which might exist in the steps of mevalonate pathway, especially in early mevalonate pathway such as the steps cata- 
lyzed by HMG-CoA synthase and HMG-CoA reductase so as to improve the productivity of carotenoids. especially 
astaxanthin. 

[0005] This invention is created based on the above endeavor of the inventors. In accordance with this invention, the 
genes and the enzymes involved in the mevalonate pathway from acetyl-CoA to FPP which are biological materials 

50 useful in the improvement of the astaxanthin production process are provided. This invention involves ctoning and 
determination of the genes which code for HMG-CoA synthase. HMG-CoA reductase, mevalonate kinase, mevalonate 
pyrophosphate decarboxylase and FPP synthase. This invention also involves the enzymatic characterization as a 
result of the expression of such genes in suitatrfe host organisms such as E coli. These genes may be amplified in a 
suitable host, such as R rhodozyma and their effects on the carotenogenesis can be confirmed by the cultivation of 

55 such a transformant in an appropriate medium under an appropriate cultivation condition. 

[0006] According to the present invention, there are provided an isolated DNA sequence coding for an enzyme 
involved in the mevalonate pathway or the reaction pathway from isopentenyl pyrophosphate to farnesyl pyrophos- 
phate. More specifically, the said enzyme are those having an activity selected from the group consisting of 3-hydroxy- 
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3-methylglutaryl-CoA synthase activity. 3-hydroxy-3-methylglutaryyl-CoA reductase activity, mevalonate kinase activity, 
mevalonate pyrophosphate decarlxMylase activity and famesyl pyrophosphate synthase. 

[0007] -me said isolated DNA sequence may be more specifically characterized in that (a) rt <»d^ ^ 
enzyme having an amino acid sequence selected from the group consisting of those described in SEQ ID NOs: 6. 7^ 8 
9 and 10 or (b) it codes Ibr a variant of the said enzyme selected from (i) an allelic variant, and (..) an enzyme having 
one or more amino add addition, insertion, deletion and/or substitution and having the stated «izyme actvrty^ Parbcu- 
larly specHled isolated DNA sequence mentioned above may be that which, can be denved from a geneof Phaffja 
rS?°a and is selected from (0 a DNA sequence represented in SEQ ID NOs. 1 2, 4 or 5. (,0 «" 'foj"^^ 
allelic variant for the DNA sequence represented in SEQ ID NOs: 1 . 2. 4 or 5; and (iii) a derivatve of a DNA sequence 
reoresented in SEQ ID NOs: 1. 2. 4 or 5 with addition, insertion, deletion and/or substitution of one or morenude- 
otde(s). and coding for a polypeptide having the said enzyme activity. Such derivatives J^f [^^.^Zj 
means on the basis of the DNA sequences as disdosed herein by methods knwn .n the sta^e of 
e g. by Sambrook et al. (Molecular Cloning. Cold Spring Hartx.ur Uboratory Press. New York. USA. second edjton 
1^) Amino acid exchanges in proteins and peptides which do not generally alter the activity are krown in he stete 
i^i artZaredescrib^. for ^mple. by H. Neurath and R. L. Hill in OTHe Proteins6 (Academic P'ess^N^^rk. 
1979 see esoeclally Figure 6, page 14). The most commonly occurring exdianges are: Ala/Ser. Val/lle. Asp/G u. 
S^^AIamJ. Ser/^n Ala/ial. Ser/Gly, Tyr/Phe. Ala/Pro. Lys/Arg. Asp/Asn. Leuflle. LeuA^al. Ala«3lu. 

mm' ^e plirtlSwtiS^^i P-wides an isolated DNA sequence, which is selected from (0 a DNA sequence 
reoresented in SEQ ID NO: 3; (ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NO. 
3 and (iin a derivative of a DNA sequence represented in SEQ ID NO: 3 with addition, insertion, deletion and/or sub- 
stitution of one or more nudeotide(s). and coding for a polypeptide having the mealonateWnaM activr^^ 
[0009] Furthermore the present invention is directed to those DNA sequences as specified above and as <J'Sck»ed 
eg. in the sequence listing as well as ttieir complementary strands, or those which indude these sequences. DNA 
sequences which hybridize under standard conditions with such sequences or fragments thereof and DNA sequences 
«Sdi because of the degeneration of the genetic code, do not hybridize under standard conditions with sudi 
sequences but which code for polypeptides having exactly the same amino acid sequence. 
mmm -standard conditions" for hybridization mean in this context the conditions which are generally used by a man 
skilled in tiie art to detect spedfic hybridization signals and which are described, e g. by Samtjrook et ^-J^^ 
Cloning" second edition. Cold Spring Hartjour Laboratory Press 1989. New YorK or preferabj so caHed stnngert 
hybridization and non-stringent washing conditions or more preferably so called stringent hybridization and str ngent 
v4shing conditions a man skilled in the art is familiar with and whidi are described. e.g. in Sambrook etal. (s^)- F""^ 
thermore DNA sequences which can be made by the polymerase chain reaction by using primers designed on the basis 
of the DNA sequences disclosed herein by methods known in the art are also an object of the P^sefrt""*"*"; J* 
understood that the DNA sequences of the present invention can also be made synthetically as described. e.g. in EP 
747 483 

[001 1 1 Further provided by the present inventton is a recombinant DNA. preferably a vedor and/or plasmid comprising 
a sequence coding for an enzyme functional in the mevalonate pathway « the reaction pathway from isopentenyl pyro- 
phShate to farnesyl pyrophosphate. The said recombinant DNA vector and/or plasmid may compnse the regulatory 
regions such as promoters and tenninators as well as open reading frames of above named DNAs. 
m\2] The present invention also provides the use of the said recombinant DNA. vector or plasmid. to transforrn a 
host organism The recombinant organism obtained by use of the recombinant DNA is capable of overecpressmg DNA 
sequence encodingan enzyme invdved in the mevakxiate pathway orthereactionp«hv,«yfro^^ 
phate to farnesyl pyrophosphate. The host organism transformed with the recombinant DNA may be i«eful in fte 
inprovement of the production process of isoprenoids and carotenoWs. in particular astaxanthin. Thus the present 

invention also provides such a recombinant organisnVtransformed host. w= «rf«ahi« 

[0013] The present invention further provides a mettiod for the produdion of isoprenoids or carotenoids. preferably 
carotenoids. which comprises cultivating thus obtained recombinant organism. 

[00141 The present invention also relates to a method for produdng an enzyme involed in the mevalonate pathway or 
50 the reaction pathway from isopentenyl pyrophosphate to farnesyl pyrophosphate, whidi comprises «^"""9 « 

binant organism mentioned above, under a condition conductive to the production of said enzyme and relates also to 

[001?]^"Th?p!esent invention will be understood more easily on the basis of the endosed figures and the more 
detailed explanations given below. 

^ Rg. 1 depicts a sdieme of deduced biosynttietic pathway from acetyl-CoA to astaxanthin in R rhodozyma. 

Fig. 2 shows the expression study by using an artificial mvk gene obtained from an artificial nudeotide addition at 
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P***^'"'* P- rhodozyma. The cells from 50 m of broth were subjected to 10 <>' 

- Cr^rl^S^SN^^ ^"'tt^erexan^lesoflsoprtJ^r^ermr^^^^^^ 

Eer,th?s.St,ragl^^^^^^^^^ « """i;; "^'^'^ only open readlng^^^e flar,ted 

adaptation to a nutrition starvation, and soln In such a raas a U^ILlS ^1 ^ 1 c^®** °' 
region around the promoter seque^e and^on^^^^nt^£'r T *° *® ^'-untranslated upstream 
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determined. . , rontainina a oortion of hmc gene, hmg gene, mvk gene, mprf 9W« 

[0024] Atfirst. we cloned a partial 9e"|<«9"'^"'~"^"j;*^ PGR Is a method to clone a gene of interest 

Ul li gene using degenerate -Jt^ 
wfiichfias high homology of ammo acKl sequence tot^^ 

function. Degenerate primer, whwh « used «J ^^T^ "^^SSlT In such a degenerate primer, a mi«d pnm^ 
the amino acid sequence to corresponding """^'^'^^J-J^^^ code is generally used. In the inv«> 

rsr,sr:L';im'e.':;:ered^'^^ 

dSe^ng on primers and genes to clone as described hereinater. ^ ^ ^^^^ 

fSs] Le^re gene containing rtscodingr^^^^^^^^^ 
!.terminator can be cloned from ach^T,^^^ 

or plasmid vector in an appropriate hos^. by usng a pajji ijn« ^ ^ as X 

as a probe after it was labeled^ °«T?J;f „t(?en^ Jfte instruction of library and a following genetic 
25 Dhage vector, or aplasmid vector such as pUC vector ©often ""^ " ^ j^ention. an EcoRI genomic 
SS^lSmsuch'asasequencing^arey^^^^ 

libra^ of P. rhodozyma was <»"^f "^l"^* l^/'^'^ de^'^ne^ by the Southern Wot hybridization for each gene 
An insert size, what length of inse.^ must ^c^nf . v^s^ete^run«^ y ^ ^ ^ ^ ,„ 
before a construction of a libraryjn jSng^^^ protocol which was prepared by the supplier 

,n iDVSt a Steroid hapten instead Of conventional 

SSringtlnnheim). A genomic library ^^'^^f^^f^^l^Z^r^'- SS-Sed plaques were picked up 
JSS-laSeled DNA fragment which had « P°^°" ^ ^1.^,^^^;^^^^ 'as JoTs lito 23 kb). prepared .DNA w^ 

and used for further study. In the case of '^^'^" "^rtirio a ismid vector sw^ 

Tested by the Ecofll. followed by the doniry o^Je Ec^^^ 

35 sL When XZAPII was used in ^^^^^ a derivative of single stranded M13 phage. Ex 

for the succeeding step of the <^°"'"9 f J^J.^* ZZrL for a sequencing. ^ . 

assist phage (Slratagene). A P'^^-^J'^^'^l^'^^SroS J^St^ sSuencer ALFred system (Pharmacia) using an 
[00261 In this invention, we used the automated fluore^^ 

LX^Jcle sequencing protocol in whi* the ^^.^^^^^l^^ZTa oo6^r^ region was used for a cloning of 
40 [00271 Afterthedeterminationofthegenom«s^u^^^ 

ioi^of correspondng gene. The PGR method««s^ ^ort^^^ 
sequences wereidentical to the sequence at the5-and3^^^^ 

an«Witiondan appropriate restriction srte an^^ 
cDNA pool was used asatemplate in this PCRclo^mg^^ 
,5 whichweresynthesized/nWfrobytheviralrwe^efransCT^^ 

Clomechwasused)byusingthemRNAobtanedf^^J^^^^^ 

and ^ • A w„,in»H in the biosvnthetic pathway which consists of multiple st^ of enzy^ 

genomic sequence. ^ simplest approach is to amplify the genomic 
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35 



genon.cfragn»n.encodingtheenzyn,eo. interest into t^e^pr^^^^ 

SI in P. Lo^y^a is ha*ored. A drug '^^^'^^^ ZTm^sZ^e gene hart»red in pQB-Pt« 

the presence of a toxic antibiotic « often used gene. Nutrition complementation maker can 

(Wery ef a/. (Gene. 184, 89-97. 1997)) « an example "V^^J^^^^'^^^ozyma ATCC24221 strain which requires 
UalL used in the host which has an appropriate ai^ojoph^^^^ 

Syfidineloritsgrowthisoneexampleoftheauxo^^^^^^ 

vector system using a nutrition <»"VtementJ.on <^b^^^^ ^ ^ pQB.ph9 

One of the vectors is an integrated vector ^^"^.^.^"^tr^i" 'S^n aJtonomous replicating sequence in the 
is an example of this type d a vedor. Be<^^e s«;^v^^^^^ 

vector, above vector cannot replicate by itself and can between a vector and a chromo- 

host^aresuftof asingle-crosslng ^^^^^^[LSlen on^^^^^^ <" ''.T 

some, in case of increasir^ a dose of Ihe^e^t^ £eaang the concentration of the corresponding drug .n ttie 
employed by using such a drug resistance marter By '""e^smg "ie ^ ^ result of recombination 

Sction mLm. the strain, in which the '^^^^^''^^^S^T^^c^. Another type of vector is 
only can sun^ive. By using such a selection, a 

which is coded by the amplified gene is expected to t^ ove^^^^ 

t00311 Anotherstrategytoovere^^ane-^e^ This strategyis also 

moter. In such a strategy, a ^r^^^^^^^^^^^^^ promoter whose p«)moter activity is induced in an 
applied to overexpress a gene of ""J^ ^^^.^S^^ production of astaxanthin accelerates in a late phase 

appropriategrowth phase and an appropnatetmit^ 

of!ie growth such as the case of produrtionrtase^^ ^ biosynthesis enzyme 

may be maximized in a late phase of grow*. In P^^' ^JJ^egis of a precursor of astaxanthin and 

decreases. 1^ example, by placing a gene, which e "wo'^ed in «ne«osy k ^ ^.^^ 

'^'e e^ession is under«,e ",^'ilnt genes. ^' the genes which are 

involves in mevalonate pathway, in the dovmstream ot ^"e P^omo expression. 
Involved in the biosynthesis of astaxanthin b««ome synch j z^ SoTcJ he mStation in its^ulatory elements. 
[0032] Stillanotherstrategytooverexpressen^esdjrte^^is.^^^^^^^^ 

For this purpose, a Wnd of reporter gene such P^^^^^^^^^^J^e^^^ sequence of the gene of interest so 
rescent ^otein. and the like is inserted ^^'^^^^^^^^ZSJLi6o. each other. Transformed 
that all the parts induding promoter, terminator aiKltt^^^^^ 

P. rhodozyma in which the said reporter gene « f^/^'"^ Mutation can be monitored by detecting 

/„ vivo to induce -^"t^^ion witWn thepromrter r^^ element of the gene, mutation point 

r^brrr^^s^'e^^^^^^ 

approach, a 8e™ «asselB, conaning a repotw 0«» «™;« ~Tf2a is tnutaseniied and then Mrodueed l«o R 
i^a. its 5'.e«) and , tt-minatt .aai». ton . »™ « "^^ ™ JT*.^ nUaHoo «oold be sceaned. 

duced solely or co-introduced by harboring on Pl^Tj .^f'* 

sequence, as well as its allelic variant, a sequence ^^^J^^^^^^ ^e activity. And such a vector can be 
sSrtions can be used as far as its <»"^^''^^'^Z^S^S!^%^eaar^iXUe^fai^ro^o^te 
' i^troduced^Rr/^^^ab^^^^^^ 

on an appropriate selection medium awn as t "-"-a auxotrooh ATCC24221 as a recipient. 
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Ex^mpl^s 

[0036] The following materials and methods were emploied in the Example described below: 
Strains 

[0037] R fhodozyma ATCC96594 (This strain has been redeposited on April 8, 1998 as a Budapest Treaty deoosit 
under accession No. 74438). 

E. coli DH5a: P. ^806. /acZAMIS. A(/acZYA-arjgF)U169. hsd (r^\ recA1. entfAl, deoR, thi-l S£/dE44 
gyrASS, re/AI (Toyobo) ' 

E. CO// XL1-Blue MRP: A{/77crA)183. A(mcrCB-/7sc/SMR-mrA)173, endAI, sapE44. fW-l, recAl oyrA96 re/AI 
lac[F proAB, /acl^ZAMIS. TnlO (tet^] (Stratagene) 

E. 00// SOLR: eUXmcrA), A(mcrCB-/7scfSMR-mrr)171, sbcC, recB, recJ. omuC :: Tn5(kanO. uvrC, lac. gyrASS. 
relA^ , ff7/-1 , endM , x". [P proAB. lad^Z AM1 5] Su"(nonsuppressing) (Stratagene. CA. USA) 

E CO// XL1 MRA(P2): A(mcrA)183, A(/77crCB-/)sc(SMR-m/T)173, endAI. Si/pE44, f/7/-1, oyrA96 re/AI /ac fP2 Ivs- 
20 ogen) (Stratagene) ' ^ ' 

E CO// BL21 (DE3) (pLysS): cfcm", ompTrQ- m^' Ion' MDE3). pLysS (Stratagene) 

E coli M15 (pREP4) (QIAGEN) (Zamenhof R J. ef a/.. J. Bacteriol. 110. 171-178. 1972) 

E CO// KB822: pcr7B80. zad :: Tn10. A(/acU169). hsdRU, endM. thi•^, supEAA 

E coli TOP10: P, mcrA. A(mrr-/)SGfRMS-mcrBC). ^0, A/acZ M15. A/acX74, recAl, deoR a/aD139 (ara- 
/eu)7697, ga/U, ^a/K. rpsL (StrO. enofAl . nupG (Invitrogen) 



10 



IS 



25 



30 

Vectors 
[0038] 

35 XZAPII (Stratagene) 
XDASHII (Stratagene) 
pBluescriptll SK+(Stratagene) 

40 

pUC57 (MBI Fermentas) 
pMOSBlue T-vector (Amersham) 
45 pET4c (Stratagene) 
pQE30 (QIAGEN) 
pCR2.1T0P0 (Invitrogen) 
Media 



so 



[00391 R rhodozyma strain is maintained routinely in YPD medium (DIFCO). E coli strain is maintained in LB medium 
(10 g Bacto-trypton, 5 g yeast extract (DIFCO) and 5 g NaCI per liter). NZY medium (5 g NaCI. 2 g MgS04-7H20 5 g 
ss yeast extract (DIFCO). 10 g NZ amine type A (Sheffield) per liter) is used for X phage propagation in a soft agar (07 % 
agar (WAKO)). When an agar medium was prepared. 1 .5 % of agar (WAKO) was supplemented. 
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Methods 

[0040] General methods of molecular genetics were practiced according to Molecular cloning: a Latx)ratory Manual, 
2nd Edition (Cold Spring Hartwr Laboratory Press. 1989). Restriction enzymes and T4 DNA ligase were purchased 
from Takara Shuzo (Japan). 

[0041 ] Isolation of a chromosomal DNA from R rhodozyma was performed by using QIAGEN Genomic Kit (QIAGEN) 
following the protocol supplied by the manufacturer. Mini-prep of plasmid DNA from transformed £ coti was performed 
with the Automatic DNA Isolation system (PI-50, Kurabo. Co. Ltd.. Japan). Midi-prep of plasmid DNA from an E. coli 
translbrmant was performed by using QIAGEN column (QIAGEN). Isolation of X DNA was performed by Wizard lambda 
preps DNA purification system (Promega) following the protocol of the manufacturer. A DNA fragment was isolated and 
purified from agarose by using QIAquick or QIAEX II (QIAGEN). Manipulation of X phage derivatives was done accord- 
ing to the protocol of the manufacturer (Stratagene). 

[0042] Isolation of total RNA from R rhodozyma was performed by the phenol method using Isogen (Nippon Gene. 
Japan). mRNA was purified from total RNA thus obtained by using mRNA separation kit (Clontech). cDNA was synthe- 
sized by using CapFinder cDNA construction kit (Clontech). 

[0043] In vitro packaging was performed by using Gigapack III gold packaging extract (Stratagene). 
[0044] Polymerase chain reaction (PGR) is performed with the themial cycler from PerWn Elmer model 2400. Each 
PGR condition is described in examples. PCR primers were purchased from a commercial supplier or synthesized with 
a DNA synthesizer (model 392. Applied Biosystems). Fluorescent DNA primers for DNA sequencing were purchased 
from Pharmacia. DNA sequencing was performed with the automated fluorescent DNA sequencer (ALFred Pharma- 
cia). 

[0045] Competent cells of DH5a were purchased from Toyobo (Japan). Competent cells of M15 (pREP4) were pre- 
pared by CaCl2 method as described by Sambrook et al. (Molecular cloning: a Laboratory Manual. 2nd Edition CoW 
Spring Harbor Laboratory Press. 1989). 

Example 1 Isolation of mRNA from R rhodozvma and construction of cDNA library 

[0046] To construct cDNA library of R rhodozyma, total RNA was isolated by phenol extraction method right after the 
cell disruption and the mRNA from R rhodozyma ATCC96594 strain was purified by using mRNA separation kit (Clon- 
tech). 

[0047] At first. Cells of ATCC96594 strain from 1 0 ml of two-day-culture in YPD medium were han^ested by centrif u- 
gation (1500 x g for 10 min.) and washed once with extraction buffer (10 mM Na-citrate/ HCI (pH 6.2) containing 0.7 M 
KCI). After suspending in 2.5 ml of extractfon buffer, the cells were disrupted by French press homogenrzer (Ohtake 
Works Corp. Japan) at 1500 kgf/cm^ and immediately mixed with two times of volume of isogen (Nippon gene) accord- 
ing to the method specified by the manufacturer. In this step 400 ^g of total RNA was recovered. 
[0048] Then this total RNA was purified by using mRNA separation kit (Clontech) according to the method specified 
by the manufacturer. Finally. 16 jig of mRNA from R rhodozyma ATCC96594 strain was obtained. 
[0049] To construct cDNA library. CapFinder PCR cDNA construction kit (Ctontech) was used accoiding to the 
method specified by the manufacturer. One ^g of purified mRNA was applied for a first strand synthesis followed by 
PCR amplification. After this amplHication by PCR. 1 mg of cDNA pool was obtained. 

Example 2 Cloning pf the partial hmc (3-hvdroxv-3-methvlQlutarvl-CQA synt h ase) aene from R rhodozyma 

[0050] To done a partial hmc gene from R rhodozyma, a degenarate PCR method was exploited. Two mixed primers 
whose nucleotide sequences were designed and synthsized as shown In TABLE 1 based on tiie common sequence of 
known HMG-CoA synthase genes from other species. 



TABLE 1 

Sequence of primers used in the cloning of hmc gene 
Hmgs1 : GGNAARTAYACNATHGGNYTNGGNCA (sense primer) (SEQ ID NO: 11) 
Hmgs3 ; TANARNSWNSWNGTRTACATRTTNCC (antisense primer) (SEQ ID NO: 12) 
(N=A. C. G or T: R=A or G. Y=C or T. H=A. T or C, S=C or G, W=A or T) 

[0051] After the PCR reaction of 25 cycles of 95 ^C for 30 seconds. 50 ^C for 30 seconds and 72«C for 15 seconds 
by using ExTaq (Takara Shuzo) as a DNA polymerase and cDNA pool obtained in example 1 as a template, reaction 
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SrS,":!^^ - » '^^'^ ™. ^ CON* «n. «. *^ - 

pHMC21 1 and used for further sludy. 

F^^-vT^.^ ^ tentetion m nnnnmir "--^ P rf)nrfo2vma 

100521 To^atea«eno.cONAfr<..~.O.AGEN,enon.c«.v«susedaccordingto^ 

Hied by the manufacturer. *Trrc««<u strain from 100 ml of overnight culture iri YPD medium were har- 

10053] At first, cells of R ^''otfo^y'"^ ATCC9^94sfraHi from 1 
lesteibycentrifugation(1500xgfor10m.nOa^^^^ 

,nM EDTA). After suspending .n 8 ml /I o» „i^ure was incubated for 90 minutes 

centration of 2 mgAnl to dterupt cells ^^^^^^^^^^^^0 w o» ^ 
at 30 'C and then proceeded to the neicl extraction step. Rnany. re a 

10054, So^ernb^hybr^nwasper^m^^^^^^^^^^^^^ 

Kyrr«.Two,gotgenomicDNAwasd^e^edb^^^ „,Smbranr(Hytx,nd N.. Amersham) by 

acidic and alkaline treatment. The ^'"^^^^^^^^SZi to nylon membrane was fixed by a heattreat- 
using transwot (Joto Rika) for an hour. The ^^A «rti.ch was tra^^^^^ ^ pHMC211) 

S (80 -C. 9i minutes). A probe was P^^/** 'f h^^^^ with the method specHled by the 

p^-T ^.. anninq i t n m T T-'^ ^""'"^^ "^"^'"'"^ ^^'^ ^^"^ 
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,0055, Per .g of the genomic ONA v.sdig^«. by ^co. ar^i-- ^ Wirr S 
DNAs whose length is within tt,e range *^°^„J°^,^;°T!f,^y5^NA was ligated to 1 ng of Ecofll-dig^ted and 
according to the method specif led by i e -C overnight, and packaged by G.gapa<* 

CI AP (caH intestine alkaline phosphatase) -treated ^^^S^^^^^^^^l^ ^ ^ coli XL1 Blue MRF strain and over-laid 
mgoW packaging ex.ract(8tratagene).Thef«^^^ ^^^^^ ^ Hcofll- and Sa/I- 

wi* NZY medium poured onto LB agar '"«*""',;^J^ie^ 
digested pHMC211asaprobe. Two plaqueswerehyb^id««lto^^^^ 

S accSKJIng to the method specified by "^"f analysis and sequencing. As a r^ult d 

same fragments in the opposite direction each other as r^ults w^^^^ sequence as that of pHMC211 ctone. One of 
fencing, the obtained Ecofll ^^'^^"'fJ^tZ^l A^^e nucleotide sequence was obtained 
Sef^asmklswasdesignatedaspHMC^e^^usedfo^^ 

i^ro^pS^etrs^^^^^^^^^^ 

with about 1 kb of 3'-terminal untranslated region. 
cv-m plB 6 Clc '''"T "« ■■psiream rwiinn pf hm 0909 

pHMC 526 does not contain its 5' end of hmc gene, ai iirai. me 
2 were synthesized. 
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55 



TABLE 2 



sequence of primer s used in the cloning o f 5'- adjacent region of hmc gene 
" Hmc21 ■ GAAGAACCCCATCAAAAGCC1 CG A (primary primer) (SEQ ID NO: 13) 
Hmc22 ■ AAAAGCCTCGAGATCCTTGTGAGCG (nested primer) (SEQ ID NO: 14) 
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[0057] Protocols for library construction and PGR condition were the same as those specified by the manufecturer by 
using the genomic DNA preparation obtained in Example 3 as a PGR template. The PGR fragments that had EcoFN 
site at the 5' end (0.45 kb). and that had Pw;ll site at the 5' end (2.7 kb) were recovered and cloned Into pMOSBlue T- 
vector by using £ coli DH5a as a host strain. As a result of sequencing of each 5 of independent clones from both con- 
structs. It was confirmed that the 5' adjacent region of hmc gene was cloned and small part (0.1 kb) of EcoRl fragment 
within its 3' end was found. The clone obtained by the PvuW construct in the above experiment was designated as 
pHMCPv708 and used for further study. 

[0058] Next. Southern blot analysis was performed by the method as shown in above Example 4, and 5'- adjacent 
region of the hmc gene existed in 3 kb of EcoR\ fragment was determined. After a construction from 2.5 to 3.5 kb EcoRl 
library in XZAPII, 600 plaques were screened and 6 positive dones were selected. As a result of sequencing of these 
6 clones, it was clarified that 4 clones within 6 positive plaques had the same sequence as that of the pHMCPv708 and 
one of those was named as pHMG723 and used for further analysis. 

[0059] The PGR primers whose sequences were shown in TABLE 3 were synthesized to clone small (0.1 kb) EcoRl 
fragment locating between 3.5 kb and 3.0 kb EcoR\ fragments on the chromosome of R rhodozyma. 



TABLE 3 

Sequence of primers used in the cloning small EcoR\ portion of hmc gene. 
Hmc30 ; AGAAGCCAGAAGAGAAAA (sense primer) (SEQ ID NO: 15) 
Hmc31 ; TGGTGGAGGAAAQTAQAT (antisense primer) (SEQ ID NO: 16) 



[0060] The PGR condrtfon was the same as shown in Example 2. Amplified fragment (0. 1 kb in its length) was cloned 
into pMOSBlue T-vector and transformed £ coli DH5a. Plasmids were prepared from 5 independent white colonies and 
subjected to the sequencing. 

[0061] Thus, it was determined that the nucIeotkJe sequence (4.8 kb) contained hmc gene (SEQ ID NO: 1) Goding 
region was in 2432 base pairs that consisted of 1 1 exons and 10 introns. Introns were scattered all through the coding 
region without 5' or 3' bias, ft was found that open reading frame consists of 467 amino acids (SEQ ID NO: 6) whose 
sequence is strikingly similar to the known amino acid sequence of HMG-GoA synthase gene from other species (49.6 
% identity to HMG-CoA synthase from Schizosaccharomyces pombe). 

Example 7 Expression of hmc gene in E, coli and confirmation o f its enzvmatic artix/ity 

[0062] The PGR primers whose sequences were shown in TABLE 4 were synthesized to clone a cDNA fraament of 
hmc gene. 



TABLE 4 

Sequence of primers used in the cloning of cDNA of hmc gene 



Hmc25 : GGTACCATATGTATCCTTGTACTACCGAAG (sense primer) (SEQ ID NO: 17) 
Hmc26 : GGATGCGGATGGTGAAGGAGAAGGGACGTG (antisense primer) (SEQ ID NO: 18) 

[0063] PGR condition was as follows; 25 cycles of 95 for 30 seconds. 55 for 30 seconds and 72 'G for 3 minutes 
As a template. 0. 1 ^g of cDN A pool obtained in Example 2 was used, and Pf u polymerase as a DNA polymerase Ampli- 
fied 1 .5 kb fragment was recovered and cloned in pT7Blue-3 vector (Novagen) by using perfectly blunt cloning kit (Nova- 
gen) according to the protocol specified by the manufacturer. Six independent clones from white colonies of E. coli 
DH5a transfomiants were selected and plasmids were prepared from those transformants. As a result of restriction 
analysis. 2 dones were selected for a further selection by sequencing. One clone has an amino acid substitution at 
position 280 (from glycine to alanine) and another has at position 53 (from alanine to threonine). Alignment of an amino 
acid sequences derived from known hmc genes showed that alanine residue as well as glycine residue at position 280 
was observed well in all the sequences from other species and this fact suggested that amino acW substitution at posi- 
tion 280 would not affect its enzymatic activity. This done (mutant at position 280) was selected as pHMG73l for a suc- 
ceeding expression experiment. 

[0064] Next. 1 .5 kb fragment obtained by Nde\- and BamHl- digestion of pHMG731 was ligated to pETI 1c (Strata- 
gene) digested by the same pairs of restriction enzymes, and introduced to E. coli DH5a. As a result of restriction anal- 
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ysis plasmidthat had acorre(rt8tructure(pHMC818) was rewvered. Then, competent £ co// BL21 (DE3) (pLysS) cells 
(Stratagene) were transformed, and one clone that had a correct structure was selected for further study. 
[00651 For an expression study, strain Bl^l {DE3) (pLysS) (pHMC818) and vector control strain BL21 (DE3) (pLysS) 
(pETI 1c) were cultivated In 100 ml of LB medium at 37 'C until OD at 600 nm reached to 0.8 (about 3 hours) in the 
5 presence of 100 (igAnl of ampidllin. Then, the broth was divided in two portions of the same volume, and then 1 mM of 
isopropyl p-D-thiogalactopyranoside (IPTG) was added to one portion. Cultivation was continued for further 4 hours at 
37 'C Twenty five |il of broth vras removed from induced- and uninduced- culture of hmc clone and vector control cul- 
tures and subjected to sodium dodesyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE) analysis. K was con- 
fimied that protein whose size was similar to deduced molecular weight from nucleotide sequence ( 50.8 kDa) was 
10 expressed only in the case of clone that haibored pHMCSI 8 with the induction. Cells from 50 ml broth were harvested 
by the centrifugation (1500 x g. 10 minutes), washed once and suspended in 2 ml of hmc buffer (200 mM Tris-HCI (pH 
8 2)) Cells were disrupted by French press homogenizer (Ohtake Works) at 1500 kgf/cm^ to yield a crude lysate. After 
the centrifugation of the crude lysate. a supernatant fraction was recovered and used as a crude extract for an enzy- 
matic analysis In the only case of induced lysate of pHMCSI 8 clone, a white pellet was spun down and was recovered. 
76 Enzyme assay for 3-hydroxy-3-melhylglutaryl-CoA (HMG-CoA) synthase was performed by the photometric assay 
according to the method by Stewart etal. (J. Biol. Chem. 241(5). 1212-1221. 1966). In all the crude extract the acbvity 
of 3-hydroxy-3-methylglutaryl-CoA synthase vras not detected. As a result of SDS-PAGE analysis of the wude extract, 
an expressed protein band that had found in expressed broth was disappeared. Subsequently the white pellet that wm 
recovered from the crude lysate of induced pHMCBIS clone was sduWIized with 8 M guanidine-HCI. and then sub- 
so jected to SDS-PAGE analysis. The expressed protein was recovered in the white pellet, and this suggested that 
expressed protein would form an inclusion body. 

[0066] Next an expression experiment in more mlkJ condition was conducted. Cells were grown in LB medium at 28 
•C and the induction was performed by the addition of 0.1 mM of IPTG. Subsequently, incubation was continued further 
for 3 5 hours at 28 "C and then tine cells were harvested. Preparation of the crude extract was the same as the previous 
25 protocol Results are summarized in TABLE 5. It v«s shown that HMG-CoA synthase activity was only observed in the 
induced cufture of the recombinant strain harboring hmc gene, and tinis suggested ttiat the cloned hmc gene encodes 
HMG-CoA synttiase. 
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TABLE 5 



Enzymatic characterization of hmc cDNA clone 


plasmid 


IPTG 


jimol of HMG-CoA/ 
minute /mg-protein 


PHMC818 




0 




+ 


0.146 


pETIIc 




0 




+ 


0 



45 



Example 8 Ginning of hma f3-hydroxvmeth yl-3-Qlutarvl-CoA reductase) qgne 

[00671 Cloning protocol of hmg gene was almost the same as the hmc gene shown In Example 2 to 7. At first, the 
PGR primers whose sequences were shown in TABLE 6 based on the corrimon sequences of HMG-CoA reductase 
genes from other species were synthesized. 



so 



55 



TABLE 6 

Sequence of primers used in the cloning of hmg gene 



Redl ; GCNTGYTGYGARAAYGTNATHGQNTAYATGCC (sense primer) (SEQ ID NO: 19) 
Red2 ; ATCCARTTDATNGCNGCNGGYTTYTTR TCNGT (antisense primer) (SEQ ID NO: 20) 
(N=:A, C. G or T; R=A or G. Y=C or T. H=A. T or C. D=A. G or T) 

[0068] After the PGR reaction of 25 cycles of 95 •G for 30 seconds. 54 for 30 seconds and 72<'C for 30 seconds 
by using ExTaq (Takara Shuzo) as a DNA polymerase, reaction mixture was applied to agarose gel electrophoresis. 
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manufacturer and then '-S^*" P^?^^ J^j^^^^^^ , om those transtormants. As a result of sequencing, rt 

ere whose sequence were shown in TABLE 7 were synthesized. 

TABl£7 

Sequence of prime re used in the cloning of cDNA of Img gene 

Red8 ; QGCCATTCCACACTTGATGCTCTQC (antisense primer) (SEQ ID NO: 21) 
Red9 : QGCXXSATATCTTTATQQTCCT (sense primer) (SEQ ID NO: 22) 



20 



25 



30 



minute at 72 *C. . . ^ ^ _ «^r*«rm£iH tn Hnnp aenomic seauence which contains the entire hmg 

ment) e)dsted on the genome of P. rhodozymaas fou|^ in ^'^^^^^^ ^^SHII vector was constructed. The 
10072] Next,agenomlclib2ryca,«^^^^^^ 

packaged exlrac^was '"^^ed to £ cohXU »"«-^f^J2ru8ing 0.6 tof^ment of Stu\- d'^ested pRED107 as a 
onto LB agar medium. About 5000 plaque we^e ^a^eenea oy ua 9 ^ prepared and DNA was purified with 

probe. 4 plaques were hybridized to jf^^'^ ^^^^^^^^^^^ and was digested 

Wizard lambda purification systemaccortingtothe^^^^ 

with£c«flltoisolate10W)ofEcofflfragmertandtoclonejn£c^^^ 
agene). Eleven white colonies were selected and sutjieded * a PC"^^ "^^^ 

oBd. w*™. 0. 728S bK. pairs '^'^'^f^ ™^lSS^^Sw«Ke o( 10K an*» 

HMG-CoA reductase from Ustilago maydis). 

P^„pp)^ ft ^npff f''"" Mit T^r vl-t""'!""' domain of hwa nmft In £ coll 

10073, Somespeciesofprokaryotesh^«.ut-eHM^^^^^ 
50 267, 5829-5834. 1992). However. eukarvoles. ^^^^^'^f^^^^ 19^. ,„ ^.^gi (i.e. Saccharomyces 
an^no-terminal membrane dom^n (Skaln.k ^tf'^J«^- ^^^ '^j^^^ doSis large and complex, contain- 
cerevis/aeandthesmutfungus. ^sh/ago maycAs) and in an n«ls*ern^^^ s g ^ 
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Ustilago maydis was expressed in active form in £ coli (Microbiology. 140. 2363-2370. 1994). The inventors of the 

E/^tl*^® PGR primere whose sequences were shown in TABLE 8 were synthesized to clone a partial cDNA 
fragment 0^ hmg genejhe sense pnmer sequence corresponds to the sequence which starts from 597th amino add 
g^te)res.due. and a length Of protein and CDNA whi^ 
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W TABLE 8 

Sequence of primers used In the cloning of a part ial cDNA of hmg gene 

Red54 : GGTACCGAAGAAATTATGAAQAQTQG (sense primer) (SEQ ID NO: 23) 

RedSS : CTGCAGTCAGGCATCCACGTTCACAC (antisense primer) (SEQ ID NO: 24) 

[0076] The PGR condition was as fbllcws; 25 cycles of 95 for 30 seconds, 55 "C for 30 seconds and 72 -C for 3 
m,nut«./teatemplate.0.lHg of cDNApool Obtained in BcampleSandasaDNA polymerase 
used. Amplrfied 1.5 kb fragment was recovered and cloned in pMOSBlue T-vertw (Novm^) Tw^e ^S^.^!r^ 

IZSn^hV -"^'T r'^'^' « by siueSTgSe 

m1 T.^r"° ^" '^^'^ """"S ^"^'^ was named as pRED908 

SL I ^^^T* ""^'"^ ^"^ ^S^^*'"" °* was ligated to pQE30 (QIAQEN) 

n f ^t?'.!^ .^""^ ^"^ ''^"sformed to £ coli KB822. As a result of rS^ictfen aS 

piasm,dtha^hadacorrectstructure(pRED1002)was recovered. Then, oompetert£co//M15(pREP^^^ 
were fransformed and one done that had a correct structure was sele^^ 

[0078] For an expression study, strain M15 (pREP4) (pRED1002) and vector control strain M15 (pREP4) (DQESOi 
were cultvated rn 100 ml of LB medium at 30 -C until OD at 600 nm reached to 0.8 (about 5 hou s) n jfe pTesSfe 2 

th?hrn.h ' """^ ""f"^ ^l^ed to one portion. Cultivation continued for further 3.5 hours at 30 -C. Twenty fi,S7S 
TsDS P^GE ZZt T"" ^^^^ ^ /"^ clone and vector control cultures and^suK 

to SDS-PAGE analysis. It was confirmed that protein whose size was similar to deduced molecular weght from nud7 
otKle sequence (52.4 kDa) was expressed only in the case of done that harbored pREDl002^S tSucti^n Celt 
from 50 ml brath were han^ested by the centrifugatlon (1 500 x g, 1 0 minutes), washed once ands^^^^o mf S 
hmg buffer (100 mM potassium phosphate buffer (pH 7.0) confining 1 mM o! EDTA and 10 rSJcrf^l^^^^ Ste 
crul ? French press (Ghtake Works) at 1 500 kgf/cm^ to yield a aude lysate. After thfcen J^S o^Se 

aude lysate. a supernatant fraction was recovered and used as a crude extract for enzymatic analysis In the orly case 
of induced lysate of pREDI 002 clone, a white pellet was spun down and was recovered EnzymeSay fj sSro^ 
3-methylglutaryl-CoA (HMG-CoA) reductase was performed by the photometric asSy aS^Se mS^J 
S^e et al. (Blochem. J, 240, 541-547, 1986). In all the crude extract, the activity'of ^ Sy sretS^r^- 
CtoA symhase was not detected. As a result of SDS-PAGE analysis for the crude extract exp^prrtelSfthat 
^ ^ oa^T^ ^'""^ '"^ disappeared. Next, the white pellet that was recoverSfr^SVhe ^SlCtl o 
11 PRED1002 Clone was solubMized with an equal volume of 20 % SDS. and then subject^ to SD^PAcTln^ 

^"^^"^ experiment was performed in more mild condition. Cells were grown in LB medium at 28 
o'rsa2V?"ar^^^^ 

l!^ R»iL / !! ^ ^'^^^ Preparation of the crude exfract was the same as the previous pro- 

t««ot Resute are summarized in TABLE 9. It was shown that 30 times higher induction was observed a^l thfe suo 
gested that the cloned /jmg gene codes HMG-CoA reductase « on was ODservea. and this sug- 
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TABLE 9 





Enzymatic characterization of hmg cDNA 


5 




clone 






plasmid 


IPTGfi mol of NADPH/ 






minute / mg-protein 




pRED1002 




0.002 


10 




+ 


0.059 




pQE30 




0 






+ 


0 
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FYflmplft 10 Cloning of mesmlonate kinase (mvk) aene 



[0080] A cloning protocol of mvk gene was almost the same as the hmc gene shown In Example 2 to 7. At first, the 
PGR primers whose sequence were shown in TABLE 10. based on the common sequences of mevalonate kinase 
20 genes from other species were synthesized. 

TABLE 10 

Sequence of primers used In the cloning of mvk gene 
^ Mkl ; GCNCCNGGNAARQTNATHYTNTTYGGNGA (sense primer) (SEQ ID NO: 25) 

Mk2 ; CCCCANGTNSWNACNGCRTTRTCNACNCC (antisense primer) (SEQ ID NO: 26) 
(N=A, C, G or T; R=A or G, Y=C or T H=A. T or C. S=C or G, W=A or T) 

30 

[0081] After the PGR reaction of 25 cycles of 95 *G for 30 seconds. 46 ^G for 30 seconds and 72'»G for 15 seconds 
by using ExTaq as a DNA polymerase, the reaction mixture was applied to agarose gel electrophoresis. A 0.6 kb of PGR 
band whose length was expected to contain a partial mvk gene was recovered and purified by QIAquick according to 
the method indicated by the manufacturer and then ligated to pMOSBlue T-vector. After a transformation of competent 
35 £ coli DH5a cells, 4 white colonies were selected and plasmids were Isolated. As a result of sequencing, it was found 
that one of the clones had a sequence whose deduced amino acid sequence was similar to known mevalonate kinase 
genes. This cDNA clone was named as pMK128 and used for further study. 

[0082] Next, a partial genomic clone which contained mvk gene was cloned by PGR. The PGR primers whose 
sequence were shown in TABLE 1 1 . based on the internal sequence of pMK128 were synthesized. 

40 

TABLE 11 

Sequence of primers used in the cloning of genomic DNA containing 
mvk gene 

^ Mk5 : ACATGGTGTAGTGGATG (sense primer) (SEQ ID NO: 27) 

Mk6 ; ACTGGGATTGGATGGA (antisense primer) (SEQ ID NOP: 28) 



so [0083] PGR condition was 25 cycles of 30 seconds at 94 »G, 30 seconds at 55 ''G and 1 minute at 72 'G. The amplified 
1 .4 kb fragment was cloned into pMOSBlue T-vector. As a result of sequencing, it was confirmed a genomic fragment 
containing mvk gene which had typical intron structures could be obtained and this genomfc done was named as 
pMK224. 

[00841 Southern blot hybridization study was performed to clone a genomic fragment which contained an entire mvk 
55 gene from R rhodozyma. Probe was prepared by labeling a template DNA, pMK224 digested by Nco\ with DIG mul- 
tipriming method. Hybridization was performed with the method specified by the manufacturer. As a result, the labeled 
probe hybridized to a band that had 6.5 kb in Its lengths. Next, a genomic library consisting of 5 to 7 kb of EcoR\ frag- 
ment was constructed In the XZAPII vector. The packaged extract was infected to E. coli XL1 Blue. MRF strain (Strata- 
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gene) and over-laid with NZY medium poured onto LB agar medium. About 5000 plaques were screened by using 0.8 
kb fragment of Ncol- digested pMK224 as a probe. Seven plaques were hybridized to the labeled probe. Then a phage 
lysate was prepared according to the method specif ted by the manufecturer (Stratagene) and in vivo excision was per- 
formed by using E co/i XLIBlue MRF and SOLR strains. Fourteen white colonies were selected and plasmids were 

5 isolated from those selected transfbrmants. Then, isolated plasmids were digested by Nco\ and subjected to Southern 
Wot hybridization with the same probe as the plaque hybridization. The insert fragments of all the plasmids were hybrid- 
ized to the probe and this suggested that a genomic fragment containing mvk gene could be cloned. A plasmid from 
one of the positive clones was prepared and was named as pMK701 . About 3 kb of sequence was determined by the 
primer walking procedure and it was revealed that 5* end of the mvk gene wasn't included into pMK701 . 

10 [0085] Next, a PGR primer which had a sequence : 

TTGTTGTCGTAGCAGTGGGTGAGAG (SEQ ID NO: 29) was synthesized to clone the 5'-adjacent genomic region of 
mWf gene with the Genome Walker Kit according to the method specified by manufacturer (Clontech). A specific 1 .4 kb 
PGR band was amplified and cloned into pMOSBlue T-vector. All of the transfbrmants of DHSa selected had expected 
length of the insert. Subsequent sequencing revealed that 5 -adjacent region of mvk gene could be cloned. One of the 

IS clone was designated as pMKEVR715 and used for further study As a result of Southern blot hybridization using 
genomic DNA prepared in example 3. the labeled pMKEVR715 hybrklized to 2.7 kb EcoR\ band. Then a genomic 
library in which EcoRl fragments from 1 .4 to 3.0 kb in lengths were cloned into AZAPII was constructed and screened 
with 1 .0 kb of EcoRl fragment from pMKEVR71 5. Fourteen positive plaques were selected from 5000 plaques and plas- 
mids were prepared from those plaques with in vivo excision procedure. 

20 [0086] The PGR primers whose sequences were shown in TABLE 12, taken from the intemal sequence of 
pMKEVR715 were synthesized to select a positive clone with a colony PGR. 

TABLE 12 

PGR primers used for colony PGR to done 5 -adjacent region of mvk gene 
Mk17 ; GGAAGAGGAAQAGAAAAG (sense primer) (SEQ ID NO: 30) 
Mk18 ; TTGCCGAACTCAATGTAQ (antisense primer) (SEQ ID NO: 31) 

30 

[0087] PGR condition was as follows; 25 cycles of 30 seconds at 94 "G, 30 seconds at 50 **G and 1 5 seconds at 72 
""G. From all the candidates except one clone, the positive 0.5 kb band was yielded. One of the clones was selected and 
named as pMK723 to determine the sequence of the upstream region of mvk gene. After sequencing of the 3 -region 
of pMK723 and combining with the sequence of pMK701. the genomic sequence of 4.8 kb fragment containing mvk 
35 gene was determined. The mvk gene consists of 4 introns and 5 axons (SEQ ID NO: 3), The deduced amino acid 
sequence except 4 amino ackis in the amino terminal end (SEQ ID NO: 8) showed an extensive homology to known 
mevalonate kinase (44.3 % identity to mevalonate kinase from Rattus non^icus). 

Example 1 1 Expression of mvk oene by the introduction of 1 base at amino terminal reoion 

40 

[0088] Although the amino acid sequence showed a significant homology to known mevalonate kinase, an appropri- 
ate start codon for mvk gene couW not be found. This result suggested the cloned gene might be a pseudogene for 
mevalonate kinase. To confirm this assunption, PGR primers whose sequences are shown in TABLE 13 were synthe- 
sized to introduce an artificial nucleotide which resulted in the generation of appropriate start codon at the amino termi- 
45 nal end. 



TABLE 13 

PGR primers used for the introduction of a nucleotide into mvk gene 

so 

Mk33 : GGATGGATGAGAGGGGAAAAAGAAGA (sense primer) (SEQ ID NO: 32) 

Mk34 : GTGGAGTGAAGGAAAAGAGGAAGGAG (antisense primer) (SEQ ID NO: 33) 

55 [0089] The artificial amino terminal sequence thus introduced were as follows; NH2-Met-Arg-Ala-Gln. After the PGR 
reaction of 25 cycles of 95 ''G for 30 seconds. 55 ^'G for 30 seconds and 72 *»G for 30 seconds by using ExTaq polymer- 
ase as a DNA polymerase. The reaction mixture was subjected to agarose gel electrophoresis. An expected 1.4 kb of 
PGR band was amplified and cloned into pGR2.1 TOPO vector. After a transformation of competent E. coli TOPI 0 cells. 
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6 white colonies were selected and plasmids were isolated. As a result of sequencing, it was found that one done had 
only one change of amino acid residue (Asp to Gly change at 81st amino acid residue in SEQ ID NO: 8). TTils plasmid 
was named as pMK1 130 #3334 and used for further study. Then, the insert fragment of pMK1 130 #3334 was cloned 
Into pQE30 This plasmid was named as pMK1209 #3334. After the transformation of expression host, MIS (pREP4), 
expression study was conducted. MIS (pREP4) (pMK1209 #3334) strain and vector control strain (MIS (pREP4) 
(pQE30)) were Inoculated into 3 ml off LB medium containing 100 jig/ml of ampldllin. After the cultivation at 37 for 
3.7S hours, cultured broth were divided into two portions. 1 mM IPTQ were added to one portion and an Incubation was 
continued for 3 houre. Cells were harvested from SO [i\ of broth by the centrifugation and were subjected to SDS-PAGE 
analysis. Protein which had an expected molecular weight of 48.S kDa was induced by the addition of IPTG In the cul- 
ture of MIS (pREP4) (pMK1209 #3334) though no induced protein band was observed in the vector control culture (Rg. 
2). This result suggested that activated form of the mevalonate kinase protein coukl be expressed by the artificial addi- 
tion of one nucleotide at amino terminal end. 

Example 12 Hloninq of the mevalonate p yrophosphate decarboxylase (mod) oeng 

[0090] A doning protocol of mpd gene was almost the same as the hmc gene shown In Example 2 to 7. At first, the 
PGR primers whose sequence were shown in TABLE 14 based on the common sequences of mevalonate pyrophos- 
phate decarboxylase genes from other species were synthesized. 

TABLE 14 

Sequence of primers used in the doning of mpd gene 
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Mpdl ; HTNAARTAYTTGGGNAARMGNGA (sense primer) (SEQ ID NO: 34) 
25 Mpd2 : GCRTTNGQNCCNGCRTCRAANGT RTANQC (antisense primer) (SEQ ID NO: 3S) 

(N=A. C. G or T; R=A or G, Y=C or T H=A, T or C, M=A or C) 



[0091] After the PGR reaction of 25 cycles of 9S *»G for 30 seconds, 50 ^G for 30 seconds and 72^C for 15 seconds 
by using ExTaq as a DNA polymerase, reaction mixture was subjected to agarose gel electrophoresis. A 0.9 kb of PGR 
band whose length was expected to contain a partial mpd gene was recovered and purified by QIAquick according to 
the method prepared by the manufacturer and then ligated to pMOSBlue T-vector. After a transformation of competent 
E coli DHSa cells, 6 white colonies were selected and plasmids were isolated. Two of 6 clones had an expected length 
of insert As a result of sequencing, it was found that one of the dones had a sequence whose deduced amino acid 
sequence was similar to known mevalonate pyrophosphate decarboxylase genes. This cDNA done was designated as 
pMPDI 29 and used for further study. ^ooo u 

[0092] Next, a partial genomic fragment which contained mpd gene was cloned by PGR. As a result of PGR whose 
condition was the same as that of the doning of a partial cDNA fragment the amplified 1 .05 kb fragment was obtained 
and was cloned Into pMOSBlue T-vecta. As a result of sequencing, it was confirmed that a genomic fragment contain- 
ing mpd gene which had typical intron structures have been obtained and this genomic done was named as pMPD220. 
[0093] Southern Wot hybridization study was performed to done a genomic fragment which contained the entire mpd 
gene from R rhodozyma. Probe was prepared by labeling a template DNA, pMPD220 digested by Kpnl with DIG mul- 
tipriming method. Hybridization was performed with the method specified by the manufacturer. As a result, the probe 
hybridized to a band that had 7.5 kb in its lengths. Next, a genomic library consisting of from 6.5 to 9.0 kb of EcoR\ frag- 
45 merit in the XZAPII vector was constructed. TTie packaged extract was infected to E. coli XL1 Blue. MRF* strain and over- 
laid with NZY medium poured onto LB agar medium. About 6000 plaques were screened by using the 0.6 kb fragment 
of Kpn\' digested pMPD220 as a probe. 4 plaques were hybridized to the labeled probe. Then a phage lysate was pre- 
pared according to the method specified by the manufacturer (Stratagene) and in vivo excision was performed by using 
E coli XL1 Blue MRF and SOLR strains. Each 3 white cdonies derived from 4 positive plaques were selected and plas- 
50 mids were isolated from those selected transformants. Then, Isolated plasmids were subjected to a colony PGR method 
whose protocol was the same as that in example 8. PGR primers whose sequences were shown in TABLE 14. depend- 
ing on the sequence found in pMPD129 were synthesized and used for a colony PGR. 
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TABLE 15 

Sequence of primers used in the colony PGR to clone a genomic mpd done 
Mpd7 ; GGGAAGTGTGGGTGATGGGG (sense primer) (SEQ ID NO: 36) 
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TABLE 15 (continued) 
sequence of primers in the colony PGR to done a genomic npd done 
■■ Mpd8:CA(3ATCAGCQCOTGeiAGTGA(ant.sensep.....«J (SEQ ID NO: 37) 

PCRc«^«-v«salmost*esameas«,edon.^^^^^^^^^ 
LT^'c and 10 seconds at 72 -C. All the clone exc^ one ^-^"^^ ^^'^J about 3 Kb of sequence thereof 

synthase genes from other species were synthesized. 
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TABLE 16 
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10096, After the PGR reaction of 25 cy<^f/J^;^:^r2S;^^^^^^^^^ 

by using ExTaq as a DNA polymerase, a reaction mi^re "^J^^'IJ^ ^ ^ method prepared by the manu- 
Ks a d'esired length (0.5 Kb) was recover^ P"''^^^^^'^^"^^^^ «,// DH5« cells. 6 white colonies were 
facturerand thenligated to PUC57 vector Afterafr^^^^ ^^^^ ^ 3„ i„3ert fragmert w^ 

selected and plasmids were then isolated^ On^^^^ 
sequenced. Asaresult, it was found ttet the cloneh^as^^^^^ 

toknownfarnesylpyrophosphatesynthasegen^.^^^^ 

[0097] Next. agenomicfragmert wasclonedby POTt^ngtnes^ ^ sequenced. Th.s 

pFPS113andusedfbr^urth^c«p^^ 

&!rpa"- 

TABli 17 

Sequences of primers used for a cl oning of adjacent region of »s gene 

- Fbs7 • ATCCTCATOCa.A.CiU. lU^IALI Uonse for downstream don.ng, (SbU ID NU 4UJ 
SiZoGC^TCAAGAGATCGATC^tantisenseforupstreamc.^^^^^^^ 

10099, An^l^ed PGR ba^s were '-"^trnr^X"^^ 
So^dUtheS'-adiacent region that^2.^^^^^^^^ 

retSTtw^o^j-'wryS^^ 

and cDNA clone for gene expression in £ coli. 
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TABLE 18 



Sequences of primers used for a cDNA and genomic fps cloning 



Fps27 ; GAATTCATATQTCCACTACXSiCCTGA (sense primer) {SEQ ID NO: 42) 
Fps28 : GTCGACX3GTACCTATCACTCCCGCC (antisense primer) (SEQ ID NO: 43) 



10 [0100] PGR condition was as follows; 25 cycles of 30 seconds at 94 ''C, 30 seconds at 50 ""C and 30 seconds at 72 
''C. One cDNA done that had a correct sequence was selected as a result of sequencing analysis of clones obtained 
by PGR and was named as pFPS1 13. Next, Southern blot hybridization study was performed to clone a genomic frag- 
ment which contained the entire fps gene from R rhodozyma. Probe was prepared by labeling a template DNA, 
pFPS1 13 with DIG multipriming method. As a result, labeled probe hybridized to a band that had about 10 kb in its 



[0101] Next, a genomic library consisting of 9 to 15 kb of EcoR\ fragment was constructed in a XDASHII vector. The 
packaged extract was Infected to E, coli XL1 Blue, M RA(P2) strain (Stratagene) and over-laid with NZY medium poured 
onto LB agar medium. About 10000 plaques were screened by using the 0.6 kb fragment of Sacl- digested pFPSl13 
as a probe. Eight plaques were hybridized to the labeled probe. Then a phage lysate was prepared according to the 
20 method specified by the manufacturer (Promega). All the plaques were subjected to a plaque PGR using Fps27 and 
Fps28 primers. Template DNA for a plaque PGR was prepared by heating 2 fil of solution of phage particles for 5 min- 
utes at 99 •G prior to a PGR reaction. PGR condition Is the same as that of pFPS113 cloning hereinbefore. All the 
plaques gave a 2 kb of positive PGR band, and this suggested that these clones had an entire region containing fps 
gene. One of the XDNA that hartwred fps gene was digested with EcoR\ to Isolate 10 kb of EcoRi fragment and to clone 
25 in EcoRI-digested and GIAP-treated pBluescriptll KS- (Stratagene). Twelve white colonies from transformed E, coli 
DH5a cells were selected and plasmids were prepared from these clones and subjected to colony PGR by using the 
same primer sets of Fps27 and Fps28 and the same PGR condition. Two kb of positive band were yielded from 3 of 12 
candidates. One clone was cloned and named as pFPS603. It was confirmed that sequence of fps gene which was pre- 
viously determined from the sequence of pFPSSTu1 17 and pFPSStdl 17 were almost correct although they had some 
30 PGR ent)rs. Finally, it was determined the nucleotide sequence of 4092 base pairs which contains fps gene from R 
rhodozyma (Fig. 3), and an ORF which consisted of 365 amino acids with 8 introns was found (SEQ ID NO: 5). 
Deduced amino acid sequence (SEQ ID NO: 1 0) showed an extensive homology to known FPP synthase (65 % identity 
to FPP synthase from Kluyveromyces lactis). 
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SEQUENCE LISTING 

g (1) GENERAL INFORMATION: 

(i) APPLICANT: F.HOFFMANN-LA ROCHE. AG 

(ii) TITLE OF INVENTION: Inqprovement of microbiological 

carotenoid production and biological materials therefor 

10 (iii) NUMBER OF SEQUENCES: 43 

<iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: 

(B) STREET: Grezacherstrasse 124 

(C) CITY: BASLE 

^5 (E) COUNTRY: SWITZERLAND 

(F) ZIP: CH-4002 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

^ (D) SOFTWARE: PatentIn Release #1.0, Version #1.25 

<vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
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(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 061-688 25 11 

(B) TELEFAX: 061-688 13 95 

(C) TELEX: 962292/965542 hlr c 



(2) INFORMATION FOR SEQ ID N0:1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6370 base pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEDNESS: double 
35 (D) TOPOLOGY: linear 

(11) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

40 (iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1441.. 1466 

<ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1467.. 1722 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1723.. 1813 



55 



19 




5 



10 



IS 



20 



25 



30 



35 



40 



45 



SO 

{ 



EP0 955 363 A2 



(ix) FEATURE: 

(A> NAME/KEY: intron 
(B> LOCATIW: 1814.-1914 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B> LOCATION: 1915.. 2535 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCTATION: 2536.. 2621 

(ix) FEATURE; 

(A) NAME/KEY: exon 

(B) LOCATION: 2622.. 2867 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2868.. 2942 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION; 2943.. 3897 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 389B..4030 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 4031.. 4516 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 4517.. 4616 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B> LOCATION: 4617.. 4909 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 4910.. 5007 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION; 5008.. 5081 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 5082.. 5195 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5196.. 5446 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 5447.. 5523 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5524.. 5756 

ix) FEATURE: 

(A) NAME/KEY; polyAu.site 
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(B) LOCATION: 6173 
(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

^ GGAAGACATG AT6GT6TGG6 TGTGAGTATG AGCGTGAGCG T6GGTATGGG CCTGGGTGTG 60 

GGTATGAGCG GTGGTGGTGA TGGATGGATG G6TGGGTGGC GTGGAGGGGT CCGTGCGGCA 120 

AGATGTTTTC TCTGGGTAGG AGCGTTCTGC ATTGGGGCAG GAGAAAAAAT AGTGTGGTTA 180 

^0 CGGGAGATCG TGGTTACATC AAGCCATCGT CACTGTAAGG CTCTGTAAGG CTCGGTTGTT 240 

AAGAAGGTAA CCAAGTGTAA TCACTTGGTT CGCGGGGTGA CACTTAGGCT CTG6CGATTA 300 

ATATATCTGA AGCAGACCAA ACTATTAACA ATATACTTTT G6ATAAGAGG TTTCAACAAG 360 

IS AATCTCAGCT TGAGGAAAAC TCTTATCCAA GAAGGCGCGA GGGCGTCCCC CTTTTATATC 420 

AGGACCCCTC GCOCATTTGG TCTGCCACTA AAGATATACA TATGACGAGC CTAGAGAGGC 480 

TCGAGATCAC GAAAACTAAA AAGATGAAGC ATGAACCATG CAAACTAGAG CATGATGGAA 540 

AATGGGCGAA GAGGCATAAG GGATGGAGGG AACGAATAGC CTGTAGGGGT AACCCACGTA 600 

AGAGAACACG TGATACTTAA CCCGTATCCC TGACAGTCAC GGTGTTTCTT GAGAGTCAGT 660 

AATGTCCAGC TGTGACCTCA CGTGACTAAA CCCGACACGT GTGCTTCGAC CGAGGTGGGA 720 

CGATCTTTTT TTTGGGGGGA GAAACCGAGT GGGACGATAG AGAGGACTAC GGAGAACTGT 780 

AGTGAATTGT AGTGCGCTCA CTACGGAGAG TTCTAGTTGA GCAAGCGATG TGATTTTCAA 840 

TACAATCCCG GACTACAAGC TCTCTAATAG AGCTCTATAA TAGAAGGACA AAAGTCGTCC 900 

CACTCCTATC TCCCGCGCGT TTTAATAGAG ACCGATTGTT TTTTTCCCTA ATGTTTTATT 960 

TTCTTTCCCC GATCGGCTCA TTTTTCTTCT CTCCGCGTAT TCTTCACACA ACGCTCCCTC 1020 

CGATCTTTTT TCTTCTTGTT CCTGTTCCTC TTCGTCTCCT TCCATTGTCT TCTTTCCTTC 1080 

CTTCCTTCCT TCTTGCCTCT AGCCAGCTTC AACAGCGACG TCTCTCTCTC TCTGTGTGGT 1140 

^ GATCTCCGAC TGTACTGTCT CTCTCGGTCA CTTTCACGAA TCAACTTCGT TTCTTTTCTG 1200 

ATCGATCGGT CGTCTTTCCC TCAATCCGTG CATACACTCA CACTTACACT CACACCCACA 1260 

CACTCAAACA CGCTAAATAA TCAGATCCOT CTCCCCTTCT TGATCTCCTT CGGCTTAGGC 1320 

40 AATGGCTTCC TTGTTCGGCC TCCGGCGGTC CTCAAACGAG CAGCCGCGCT CTCCTCTGCT 1380 

CATCCAATCG AAGTCATCCT TTCTACCTTT GTCGTGGTCA CCTTGACGTA CTTTCAGTTG 1440 

ATGTACACCA TCAAGCACAG TAATTTGTAC GTCC6ATCAT CTATTTGTCG TGTTCTCCTT 1500 

AGTCTCTTTC TCTTCCTCCT TT6TCTTTCG CGTCAGCGTG GCTGGATTTC CGTCTCCATG 1560 

TCATTTCCCT TATTTCCTCT TCCTGTCATT TGTTCCTCTA CTTTTCTTTC TCTACCTCCT 1620 

TTCCCTGTCG TTTGCTTTCC TTCGCCAGTT 6ACCACCGAT CCTCAGGATT CATGGCTAAC 1680 

ATGCCCAACA CAAACTTGCA TATCATCTCT CTTCGTCCAC AGTCTTTCTC AGACGATTAG 1740 

CACACAATCT ACCACCAGCT GGGTCGTCGA TGCGTTCTTC TCTTTGGGAT CCAGATACCT 1800 

TGACCTCGCG AAGGTTAGTC AGTTGACCCT CTCAT6CTTC TTTTCTCTCA GTCTTGTGTG 1860 
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TGCGCATATA 


CCCACTCATA 


GACATCTTCG 


TACGCTGCAC 


TTTCCCTCCC 


TTAOCAAGCA 


1920 


GACTCGGCCG 


ATATCTTTAT 


GGTCCTCCTC 


GGTTACGTCC 


TTATGCACGG 


CACATTCGTC 


1980 


CGACTGTTCC 


TCAACTTTCG 


TCGGATGGGC 


GCAAACTTTT 


GGCTGCCAGG 


CATGGTTCTT 


2040 


GTCTCGTCCT 


CCTTTGCCTT 


CCTCACCGCC 


CTCCTCGCCG 


CCTC6ATCCT 


CAACGTTCCG 


2100 


ATCGACCCGA 


TCTGTCTCTC 


GGAAGCACTT 


CCCTTCCTCG 


TGCTCACCGT 


CGGATTTGAC 


2160 


AAGGACTTTA 


CCCTCGCAAA 


ATCTGTGTTC 


AGCTCCCCAG 


AAATCGCACC 


CGTCATGCTT 


2220 


AGACGAAAGC 


CG6TGATCCA 


ACCAGGAGAT 


GACGACCATC 


TCGAACAGGA 


CGA6CACAGC 


2280 


AGAGTG6CCG 


CCAACAAGGT 


TGACATTCAG 


TGGGCCCCTC 


CG6TCGCC0C 


CTCCCGTATC 


2340 


GTCATTGGCT 


CGGTCGAGAA 


6ATCGGGTCC 


TCGATCGTCA 


GAGACTTTGC 


CCTCGAGGTC 


2400 


GCCGTCCTCC 


TTCTCGGAGC 


CGCCAGCGGG 


CTCGGCGGAC 


TCAAGGAGTT 


TTGTAAGCTC 


2460 


GCCGCGTTAA 


TTTTGGTGGC 


CGACTGCTGC 


TTCACCTTTA 


CCTTCTATGT 


CGCCATCCTC 


2520 


ACCGTCATGG 


TCGAGGTAA6 


CCTTTTCTTC 


AAGTTTCTTG 


CTGTCATTTT 


CCTTTCGACA 


2580 


CGTATGCTCA 


TCTTTCGTTT 


CCGTCTCTCT 


CACCTTTCCA 


GGTTCACCGA 


ATCAAGATCA 


2640 


TCCGGGGCTT 


CCGACCGGCC 


CACAATAACC 


GAACACCGAA 


TACTGTGCCC 


TCTACCCCTA 


2700 


CTATCGACG6 


TCAATCTACC 


AACAGATCC6 


GCATCTCGTC 


AGGGCCTCC6 


GCCCGACCGA 


2760 


CCGTGCCCGT 


GTGGAAGAAA 


6TCTG6AGGA 


AGCTCATGGG 


CCCA6A6ATC 


GATTGGGCGT 


2820 


CCGAAGCTGA 


GGCTCGAAAC 


CCGGTTCCAA 


AGTTGAAGTT 


GCTCTTAGTA 


AGTAAACTTC 


2880 


CTTTGTTCTT 


CTCATCATTC 


TTTATCTCCG 


AATCCTGACG 


TCGGACCCTT 


CTCGATTCAA 


2940 


AGATCTTGGC 


CTTTCTTATC 


CTTCATATCC 


TCAACCTTTG 


CACGCCTCTG 


ACCGAGACCA 


3000 


CAGCTATCAA 


GCGATCGTCT 


AGCATACACC 


AGCCCATTTA 


TGCCGACCCT 


GCTCATCCGA 


3060 


TCGCACAGAC 


AAACACGACG 


CTCCATCGGG 


CGCACAGCCT 


AGTCATCTTT 


GATCAGTTCC 


3120 


TTAGTGACTG 


GACGACCATC 


GTCGGAGATC 


CAATCATGAG 


CAAGTGGATC 


ATCATCACCC 


3180 


TGGGCGTGTC 


CATCCTGCTG 


AACGG6TTCC 


TCCTAAAAGG 


GATCGCTTCT 


GGCTCTGCTC 


3240 


TCGGACCCGG 


TCGTGCCGGA 


GGAGGAGGAG 


CTGCCGCCGC 


CGCCGCCGTC 


TTGCTCGGAG 


3300 


CGTGGGAAAT 


CGTCGATTGG 


AACAATGAGA 


CAGAGACCTC 


AACGAACACT 


CCGGCTGGTC 


3360 


CACCCGGCCA 


CAAGAACCAG 


AATGTCAACC 


TCCGACTCAG 


TCTCGAGCGG 


GATACTGGTC 


3420 


TCCTCCGTTA 


CCAGCGTGAG 


CAGGCCTACC 


AGGCCCAGTC 


TCAGATCCTC 


GCTCCTATTT 


3480 


CACCG6TCTC 


TGTCGCGCCC 


GTCGTCXCCA 


ACGGTAACGG 


TAACGCATCG 


AAATCGATTG 


3540 


AGAAACCAAT 


GCCTCGTTTG 


GTGGTCCCTA 


ACGGACCAAG 


ATCCTTGCCT 


GAATCACCAC 


3600 


CTTCGACGAC 


AGAATCAACC 


CCGGTCAACA 


AG6TTATCAT 


CGGTGGACCG 


TCCGACAGOC 


3660 


CTGCCCTAGA 


CGGACTCGCC 


AATGGAAAC6 


GTGCCGTCCC 


CCTTGACAAA 


CAAACTGTGC 


3720 


TTGGCATGAG 


GTCGATCGAA 


GAATGCGAAG 


AAATTAT6AA 


GAGTGGTCTC 


GGGCCTTACT 


3780 


CACTCAACGA 


CGAAGAATT6 


ATTTTGTTGA 


CTCAAAAGGG 


AAAGATTCC6 


CCGTACTCGC 


3840 
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TGGAAAAAGC ATTGCAGAAC TGTGAQCGGG CGGTCAAGAT TCGAAGGGCG GTTATCTGTA 3900 

GGTCTTTTTC TCCTTTGAAT TTCAAGCCTT GGAGGAGAGG AAAGTGCTTC GGGGTACAAT 3960 

^ ACAGGTTGTG CAAACAAACC AAGAGAAACT AAAGAAAACT TTCTTCTCCT CTCTCTCCCC 4020 

TCGACGTCAG CCCGAGCATC CGTTACTAAG ACGCTGGAAA CCTCGGACTT GCCCATGAAG 4080 

6ATTACGACT ACTCGAAAGT GATGGGCGCA TGCTGTGAGA ACGTTGTCGG ATATATGCCT 4140 

10 CTCCCTGTCG GAATCGCTGG TCCACTTAAC ATTGATGQCG AGGTCGTCCC CATCCCGATG 4200 

GCCACCACCG AGGGAACTCT CGTGGCCTCG ACGTCGAGAG GTTGCAAAGC GCTCAACGCG 4260 

GGTGGCGGAG TGACCACCGT CATCACCCAG GATGCGATGA CGAGAGGACC GGTGGTGGAT 4320 

TTCCCTTCGG TCTCTCAGGC CGCACAGGCC AAACGATGGT TGGATTCGGT CGAAGGAATG 4380 

GAGGTTATGG CCGCTTCGTT CAACTCGACT TCTAGATTCG CCAGGTTGCA GAGCATCAAG 4440 

TGTGGAATGG CCGGCCGATC GCTATACATC CGTTTGGCGA CCAGTACCGG AGATGCGATG 4500 

GGAATGAACA TGGCTGGTGA GTGCGACGAG TTTTCTTTGT TCTTCTTGTG CGGACCATGT 4560 

TTTCTCATCC AGCCAATTCA TTCTTCATTC CTTCTCGGTG TTTGGCAACC TTTTAGGTAA 4620 

A6GAACGGA6 AAAGCTTTGG AAACCCTGTC CGAGTACTTC CCATCCATGC AGATCCTTGC 4680 

TCTTTCTGGT AACTACTGTA TCGACAAGAA GCCrrCTQCC ATCAACTGGA TTGAGGGCCG 4740 

TGGAAAGTCC GTGGTGGCCG AGTCGGTGAT CCCTGGAGCG ATCGTCAAGT CT6TCCTCAA 4800 

GACAACGGTT GCGGATCTCG TCAACTTGAA CrATTAAGAAA AACTTGATCG GAAGTGCCAT 4860 

GGCAGGCAGC ATTGGAGGAT TCAACGCCCA CGCGTCGAAT ATTTTGACTG TGCGTACTTC 4920 

^ TCTTTCCATA TTCGTCCTCG TTTAATTTCT TTTCTGTCCA GTCTTATGAC GTCTGATTGG 4980 

TTCTTCTTTT CACCCACACA CATACAGTCA ATCTTCTTGG CTACAGGTCA GGATCCTGCA 5040 

CAGAATGTGG AGTCCTCAAT GTGCATGACA TT6ATGGAGG CGTACGTTTT TTGTTTTGTT 5100 

35 TTCCTTCTTT TTCCATATGT TTCTACTTCT ACTTTCTTCC CGAGTCCGCC AAGCTGATAC 5160 

CTTTATACGG TCCTTCTCTT TCTCATGACG AGTAGTGT6A ACGACGGAAA AGATCTACTC 5220 

ATCACCTGCT CGATOCCGGC GATCGAGTGC GGAACGGTCG GTGGAGGAAC TTTCCTCCCT 5280 

CCGCAAAACG CCTGTTTGCA GATGCTCGGT GTCGCAGGTG CCCATCCAGA TTCGCCCGGT 5340 

CACAATGCTC 6TCGACTAGC AAGAATCATC GCTGCCAGTG TGATGGCTGG AGA6TTGAGT 5400 

TTGATGAGTG CTTTGGCCGC TGGTCATTTA ATCAAGGCCC ACATGAGTAA GTCTGCCACC 5460 

TTTTGATAAT CAAAAGGGTC GTGGTACTGG TGTCACTGAC TGGTGACTCT TCCTGTCATG 5520 

CAGAGCACAA TCGATCGACA CCTTCGACTC CTCTACCGGT CTCACCGTTG GCGACCCGAC 5580 

CGAACACGCC GTCCCACCQG TCGATTGGAT TGCTCACACC GATGAC6TCT TCCGCATCGG 5640 

TCGCCTCGAT GTTCTCTGG6 TTCGGTAGTC CGTCGACGAG CTCGCTCAAG ACGGTAGGTA 5700 

GCATGGCTTG CGTCAGGGAA CGAGGGGACG AGACGAGTGT GAACOTGGAT GCCTGAACTG 5760 

GGGACTCCCT TTTCTTGGTA TCCCTTCCGT TTTTCTTTCG GCCTTTGAAT CCTGTATTCT 5820 
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TGTCCGTTTT 


TTCATCTTCT CTTCCTGGTT 


CTCCTTCTCT 


CGTTCATCTG 


CAAAAACAAA 


5880 


ATTCAATCGC 


ATCGGTCTCT GGCATTCCAT 


TTGGGTTTCA 


AAATCAAATC 


AATCTCTATC 


5940 


TACTATCTCA 


AATATCTTTT TTTCATCTTT 


TGATTCATTT 


CTGTTGAAAA 


CTGTCTTGCC 


6000 


CTTCTCCTAC 


TTCTTATCTC TGCCTTCTTG 


CCAAA6TTCA ATTCGTTGTC 


CATCTGTGCA 


6060 


CTCTGATCTA 


TCAGTCTGTA TCAAGTACGC 


TCTTAAATCT 


GTAATTGGCT 


CTCGGAGGTG 


6120 


TCTCGTCATC 


TCACATATGG CTGGCGATAT 


GATGTGTCGG 


TTTCTTCCCC 


TCCAACAAA6 


6180 


GC6ACGTGGC 


TCCTTCATCA ATCTTTGGCG 


CAAGCTCTCA AAATTCTCCA AAACGGCTGA 


6240 


CTAAGCAAGG 


TTTCCAAGTA CTCTCAAACC 


GAGCAAGGCC 


ATCCATCCTC 


AAATCAACTT 


6300 


GTGAAACCCT 


TTGTGGATAG ACCGTCCAAA 


CCGAGCTCTT 


CCCAATCTTC 


GCCTCCCCTT 


6360 


CTTCCTGCAG 










6370 



(2) INFORMATION FOR SEQ ID MO:2: 

(i) SEQUENCE CHARACTERISTICS: 

tA) LENGTH: 4775 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doiible 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1305., 1361 

35 (ix) FEATURE: 

(A) NAME/KEY: int:ron 

(B) LOCATION: 1362 1504 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1505 1522 



(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATICai: 1523 1699 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1700.. 1826 

(ix) FEATURE: 

(A) NAME/KEY; intron 

(B) LOCrATION: 1827.. 1920 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1921.. 2277 



24 
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(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCyiTlON: 2278.. 2351 

{ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2352., 2409 

Ci3C) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2410.. 2497 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 2498.. 2504 

(ix) FEATURE: 
IS (j^j NAME/ KEY: intron 

(B) LOCATION: 2505.. 2586 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2587.. 2768 

20 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2769.. 2851 

(ix) FEATURE: 

(A) NAME/KEY: exon 
2^ (B) LOCATION: 2852.. 2891 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2892.. 2985 

30 (ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2986.. 3240 



35 



(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 3241.. 3325 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 3326, .3493 

(ix) FEATURE: 
^ (A) NAME/KEY: intron 

(B) LOCATION: 3494.. 3601 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 3602.. 3768 



45 



50 



55 



(ix) FEATURE: 

(A) NAME/KEY: polyA^site 

(B) LOCATION: 4043 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CATCGAAGAG AGCGAACTTGA TTAGGGAAGC CGAAGAGGCA CTAACAACGT GGTTGTATAT 60 
GTGTGTTTAT GAGTGTTATA TCOTCAAGAA CGAAGTCCAT TCATTTAGCT AGACAGGGAG 
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AGAGGGAGAA ACGTACGGGT rTACCCTATT GGACCAGTCT AAAGAGAGAA CGAGAGTTTT 
TGGGTCGGTC ACCTK^GAG TTTGAACCTC CACAAGTTTA TTCTAGATTA TTTCCGGGGG 
TATGTGAAGG ATAA^MTCAA ACTTTGTCCA GATTGAAGAA GGCAAGAAAG GAAAGGGGCG 
AACGAGAGTA TCGTCCCATC TATGGGTGAC CAGTCGACCT TCTGCATCGG CGATCCCGAG 
AATGGAAGGT TCCGATGGAT CAGAAGTAGG TTTCCTAAGC TCAAACATAG GTCATTGCGA 
GTGAGATACA TATGCAGACT GATATGCTAG TCAAACCGAA CGAGATTTCT CTGTTTCCTT 
TCAAAAAGAC GAACCAACCA TTTCATGTCC AAGATGGCAG GTCCTTCGAT TCTTTGAAGC 
TCCrcCCTGA TGCGGACAGA AAAGAATAAA AAGTAGACAG ACTGTCAAGT CGACAGCGCA 
AGTTTATCAA GCTGAGCGAG AAAACTCGAA CTTACATACC TTCGCCGTCA GTTCTGTAGA 
CCAAGCATCG GCCTTTCCTC TTTGCGGCAG GTGTACGCGT T^CTCACCA TCGTCACICT 
CGTCTCCTGA CCCGTTGCTT TCCTTCACAG CAGTCTGTTC CACAGGTTTC TCTAACTGAT 
AGGTCCCAAC AGCAAAGATA TCTGGATGTC TATGTGAGAA CTCTACTGAG TCGGCAGAGT 
ACACCCTATC GATATAGGCG AGTGAGGAAG CTTTGAAAGG TGAAGAAGTA GCGAAAGATC 
ATCAGCGAAT GAGGACTATG ACAAAAAAGA AATTTTCGTA TAATCCACTG GACAAATCAC 
CrrCCATCGT GTCCTCCAAG AGGGTTTCGT CTGAAACGTA AGGACGAGGT ATTGATAGAT 
GATTGACCTT GAGTACGCGG ATGGACAAGG AACGAGCCCA CTCCCAGGGC TATGTAACAC 
CACACC^TGAC TCCACTTGAA TTGCGGCAGA TAAACGAAGT CrTACGATCG GACGACTTTG 
TAACCATTTA GTTATTTACC COTCTTGTTT TCTTACTTTG ATCGTCCCAT TTTAGACACA 
AAAAAAGAAG CCAGAAGAGA AAAGAATAAA ACGTCTACCG TGTICTCTCC GAAl^^TAC 
CACACCCACA AAACCATACA CAATCTCAAT CTAGATATCC AGTTATGTAC ACTICTACTA 
CCGAACAGCG ACCCAAAGAT GTrGGAATTC TCGGTATGGA GGTATCTTGT TCAATTCTGT 
TTGTGTTCAA TCTTTAATCA TCTTTAGTCG ACTGACCGGT TCTTCCTTTT rrm^TTCA 
TCAAACAAAA CAACCCrrCT CGATTCATCT CATCTTTCTT TCCAATGCGC TACTCCTTCT 
GTAGATCTAC TTTCCTCGAC GAGTGCGTAA CTATTCTCTC TTCTGCATTC TCTCTCTATT 
CCCATGTTCG ATCCCTCGCC CTCATATGGG CGACTCTTTC ATCTCTTTTG CTTCCGTCCA 
^TTCTTTG ATCrrGlTCA TTTTCTACTA ATATCTCCCG ACGCGAAATA CAACACTGAC 
CGCGATTTCT CTCGATCAGG CCATCGCTCA CAAGGATCTC GAGGCTTTTG ATGGGGTTCC 
TTCCGGAAAG TACACCATCG GTCTCGGCAA CAACTTCATG GCCTTCACCG ACGACACTGA 
GGACATCAAC TCGTTCGCCT TGAACGGTCA GTCTCTTCCG TTTCAGCAAT CGACAGGAAA 
^VAGGCCCAAG CGCATCTCAC TGACACCTTT CTCCGTTTTG CAATTCCATT TGATrGTTAG 
CTGTTTCCGG TCTTCTATCA AAGTACAACG TTGATCCCAA OTCAATCGGT CGAATTGATG 
TCGGAACTGA GTCCATCATT GACAAGTCCA AATCTGTCAA GACAGTCCTT ATGGACTTGT 
TCGAGTCCCA CGGCAACACA GATATTGAGG GTATCGACTC CAAGAATGCC TGCTACGGTT 



180 
240 
300 
360 
420 
480 
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55 



26 



EP0 955 363 A2 



10 



IS 



20 



25 



30 



35 



40 



45 



50 



CTACCGCGGC 
ATGCCATTGT 
GAGGTGCTQG 
AGTTCCAATC 
CATCAATCTA 
TCTTTCTTCG 
GGTGCTCGTA 
CCTATATTTA 
AAATAGCCCA 
QCCTATGAAG 
GTCACCAACG 
TTCCACAGGT 
TTCGTGTCAT 
ACGGCCGACT 
TTCTGGTACT 
CGAAACAACC 
ATGAAGAAAA 
TCTTTCAACA 
ACCGCCTCTC 
GTAAGTCTTG 
TTAATGCTGG 
TGGAGCTGCT 
GAAGCTTGAT 
CAAAGCTCTG 
GCTAACAACC 
GGTCCGAGAA 
CTGGCCTGGA 
GGTCCCTTCT 
GATTCAAAAT 
TTTCCGATGT 
TACAATCTCT 
GACATAGATA 
GATCTTGATG 



CCTGTTCAAT 
CTTCTGCGGA 
TGCTTGCGCC 
CGTCATTTTC 
GCCGTCCACG 
TATGTTCAAA 
TCCTTCGAAT 
GTTTTTGATC 
TTGTCGATGG 
CTTACCGAAC 
GACACACCGA 
AAGCGTCATC 
CATATTATCT 
TGTAAGCAGT 
CATTATTTAT 
CCAACGACCC 
GTCTTTCAGA 
AGCAGGTTGA 
TCTTCGGTGC 
ATCTCTATCC 
CTTTCTCTTG 
GCTTCTTTCT 
CTCAACAACC 
AAGGTACGTT 
TTCTTGAATC 
GAGACTCACA 
TCGTACTACT 
GCTTGAACGG 
AAATAAATAT 
GTTTCCTCCG 
TTGGGTTTTA 
CCGTTGTGGC 
AAGAAAATTC 



GCCGTCAACT 
GACATTGCCA 
ATCCTCATCG 
TTCCACGGCA 
GAAACTTCAT 
TTTTGAAGTT 
CGTTTGTTGC 
AAATATTGTC 
ACCTCTCTCC 
AAAGTATGCC 
GGTTGCCGGT 
TTCTGTATTC 
TGTTGGAACA 
CTTTTTGTAA 
GCATCTCTTG 
GGTTTTTGCT 
CAAGAATGTC 
GCCTGGAATG 
TCTCGCAAGT 
CAATCATCTC 
AACAGGTCGG 
ATGCTCTTAA 
GATTGAGCAA 
GGATAAT6AC 
GGTCTCTTTT 
ACGCCGTGTC 
TGGGAGAGAT 
GATATTAAAA 
AACACCTTGC 
TTTCTTCCCT 
CAGGCTGGCA 
ATACACCTTG 
ACCATTGACT 



GGATCGAGTC 
TCTACGCCGA 
GACCCGACGC 
GCGGCTGAAA 
GACCAACGCT 
TGCGCTTGGG 
TTTATAGTGA 
CATTGAATTA 
GTCACTTCCT 
AAGCGATTTG 
GTCAGTGCTG 
TCCTTAAATT 
GTCCTTACGG 
CTCTTAGCTT 
AATCACCTTA 
GAGGTGCCAG 
GAGAAATCTC 
ACCACCGTCC 
TTGTTCTCTA 
TTCCTTATCA 
CAAGCGCATT 
GGTCAAGAGC 
CATGAAGATT 
TTTTTTT6TG 
GGTTTGAAAT 
ATATTCGCCC 
TGACAGCATG 
GTTTCAAAAG 
TTTTTGGCTT 
CTTTTGTTCC 
ATCTCTGTAC 
CGTCTTACAT 
CCCATCTCTT 



ATCCTCTTGG 
GGGTGCTGCC 
TCCCGTCGTC 
CAACCCTTAT 
TGGGACTTCT 
AGAGTCTTAC 
ATACGTTCGT 
ACTCTGAAAC 
ACGTCAACGC 
GAGGACCCAA 
CGTCGTTCGA 
CAACCGATCA 
AAAGCAGGTT 
GCAGATAAAA 
TCTAGTTGTA 
CCGAGCTTGC 
TGATTGCTGC 
GACAGCTC6G 
ATGTTCCTGG 
ATTGAACTGA 
GCTCTCTACG 
TCAACCGCTT 
GTCCCCTGTG 
GACCGTGGTC 
TCGCTCGGCG 
ATCGGTTCGC 
TGGCGTCGAC 
TTATGAAAGA 
GTTTTCCTTC 
TTTTTCCTCC 
AATCTTCGTT 
CTTTTGAGAG 
GAATGTCCTG 



GACGGAAGAA 
CGACCTGCCG 
TTCGAGCGTG 
CCGTCATTCT 
ACAAGCCTAA 
ACTAATTCGG 
CTGCGCACCT 
CTTCTCCTCC 
CATTGACAAG 
GACTAACGGTT 
TTACCTTTTG 
ACGGAGTTAA 
GTCAAAGGCC 
ACTTTTAGGT 
CAATGACTTC 
TACTTTGGAC 
CTCCAAGTCT 
AAACTTGTAC 
TGACGAGCTC 
ACTCTTTTCT 
CCTACGGATC 
TCATCTCTGA 
ATGACTTTGT 
TTTGTCAACC 
CTTC6ACACA 
TTGACGATCT 
AGTACAAGCA 
GGTCGGCGAA 
TTCACTCTCG 
CTCTTTTGGT 
CGCGTGATCC 
CTTCGGAGGT 
ACTAAATTGA 
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2220 

2280 
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2400 
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2700 
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2820 

2880 
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3000 

3060 
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3180 
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3300 
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3840 
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3960 

4020 

4080 
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ATTCGAAGCA ACTTATATGA AGAGCAAATT GATGGATCCA OAAAGGAACA AGOXTTAGAAA 

TCAGTOArrr gtgcgaaaaa tcagcaaatg ccgcgctgag ccgct.:gctg gggac^agac 

ATTGCCCATG CGCOTOATGT TGTCTGACCG TTCTCCTCCA TTCCCCCACT CTCAACC™: 
CTCTCrrxGA GAATCGAAGA AGAAGGCGAA GAAAACCTGA CT..5ATCCTT TACAGGGTGT 

.TCrrrTGTT cgtatct.^ ttac^ttcc tcctttcctt cctgcttgag tgaatcactg 

ATCTGACTCC TCCGCCTACC TCGCCGACTG GGCTATATCT TGAGGATAGA ATATCCCCCT 
CACAATCCCA ^CTCAAGA rrCTTrC^ CAAGAAAACT AGTTCCAATC AATAGAO^AT 
CTGATCAACC TTC^TGAAC ATAA.CATCT GCAGAAGCAC TGAACTGAGA AAGTCTTCCT 
CAGAGGAAAG AGAATACTAG ATAAGA^AT ,K:GGrrGGGA AGGTAAAGGA A^GTCTG 
GTTCTGGGrr TAGCTCTGGT TCCGTAGGGG GT^GACTAT AGrTTCTTCT GTrCGACTAG 
^^CAGGAGA AACCGTACAT GTAAA^^A TGATATTCTT G^TCTGTAT CATGO^CCGC 
TCATCTCTTT GTTTGCAAGT CACTCTCGAG AATTC 

(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4135 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: 0NA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SEHSE: NO 

(vi) ORIGINAL SOURCE: ^^^^^ 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

<ix) FEATURE: 

(A) NAHB/KEY: exon 

(B) LOCATION: 1021.. 1124 

(ix) FEATURE: 
40 lA) NAME/KEY: intron 

(B) LOCATION: 1125.. 1630 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B> LOCATlCaJ: 1631.. 1956 

^ (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1957.. 2051 

(ix) FEATURE: 

(A) NAME/KEY: exon 
50 (B) LOCATION: 2052.. 2366 

(ix) FEATURE: 

<A) NAME/KEY: intron 



55 



30 
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4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 
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(B) LOCATION: 2367.-2446 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2447.. 2651 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2652.-2732 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2733.. 3188 

(ix) FEATURE: 

(A) NAME/KEY: polyA^site 

(B) LOCATION: 3284 



60 
120 
180 
240 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
ACTGACTCGG CTACCGGAAA ATATCTTTTC AGGACGCCTT GATCGTTTTG GACAACACCA 
TGATGTCACC ATATCTTCAG CGGCCGTTGG AGCTAGGAGT AGACATTGTA TACGACTCTG 
^ GAACAAAGTA TTTGAGTGGA CACCACGATC TCATGGCTGG TGTGATTACT ACTCGTACT6 

AGGAGATTGG GAAGGTTCOT GCTTGCTTGC TTTGAATGTC GTGCCTAAAG CCATTGCCAT 

AAGACAGAGT CTGATCTATG TCGTTTGCCT ACAACAGAGA ATGGCCTGGT TCCCAAATGC 300 

25 TATGGGAAAT GCATTCTCTC CGTTCGACTC GTTCCTTCTT CTCCGAGGAC TCAAAACACT 360 

TCCTCTCCGA CTGGACAAGC AGCAGGCCTC ATCTCACCTG ATCGCCTCGT ACTTACACAC 420 

CCTCGGCTTT CTTGTTCACT ACCCCGGTCT GCCTTCTGAC CCTGGGTACG AACTTCATAA 480 

CTCTCAGGCG AGTGGTGCAG GTCCCGTCAT GAGCTTTGAG ACCGGAGATA TCGCGTTGAG 540 

30 

TGAGGCCATC GTGGGCGGAA CCCGAGTTTG GGGAATCAGT GTCAGTTTCG GAGCCGTGAA 600 

CAGTTTGATC AGCATGCCTT 6TCTAATGAG GTTAGTTCTT ATGCCTTCTT TTCGCGCCTT 660 

CTAAAATTTC TGGCTGACTA ATTGGGTCGG TCTTTCCGTT CTTGCATTTC AGTCACGCAT 720 

^ CTATTCCTGC TCACCTTCGA GCCGAGCGA6 GTCTCCCCGA ACATCTGATT CGACTGTGTG 780 

TCGGTATTGA GGACCCTCAC GATTTGCTTG ATGATTTGGA GGCCTCTCTT GTGAACGCT6 840 

GCGCAATCCG ATCAGTCTCT ACCTCAGATT CATCCCGACC GCTCACTCCT CCTGCCTCTG 900 

^ ATTCTCCCTC GGACATTCAC TCCAACTGGG CCGTCGACCG AGCCAGACAG TTCGAGCGTG 960 

TTAGGCCTTC TAACTCGACA GCCGGCGTCG AAGGACAGCT TGCCGAACTC AATGTAGACG 1020 

ATGCAGCCAG ACTTGCGGGC GATGAGAGCC AAAAAGAAGA AATTCTTGTC AGTX^ACCGG 1080 

45 GAAAGGTCAT TCTGTTCGGC GAACATGCTG TAGGCCATGG TGTTGTGAGT GAGAAATGAA 1140 

AGCTTTATGC TCTCATTGCA TCTTAACTTT TCCTCGCCTT TTTTCTTCTC TTCATCCCGT 1200 

CTTGATTCTA GGGATGCCCC CCTTTGCCCC TTTCCCCTTC TTGCATCTGT CTATATTTCC 1260 

TTATACATTT CGCTCTTAAG AGCGTCTAGT T6TACCTTAT AACAACCTTT GGTTTTAQCA 1320 
TCCTTTGATT ATTCATTTCT CTCATCCTTC GGTCAGAGGC TTTCGGCCAT CTTTAC6TCT 
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r^^^ — 
..^^o^ rra^o cctcttx. rrr^ atc.tc^ ggto^ 

c;.TCCCo;. ^-tt«c tgctc^^xc ^rc^ « ^oxa^ 

CCTG.TACGT GTCTCTAT^T Gr«rGG«r<^ ACGTCCGAAT T^TGACTCGA CGTTCTACGC 
„^C.G «:CC<^TC. CTOCTTCO^ ^-T^ ^-^^^ "^^^^ 

c;«:toct.cg aou^t ctcotcott .tc«c..c. a.c.t..cc. tctcccxa«: 

«^^C TTT;^^ CTTCOCC^ CCT^C 

ccg;.^. oc<^..«: c^tctct ct«:cc«.ca t^ccco aa.tc<^c 

OATCOC^T CAAGCTO^ ACOOACOAC- AAGOGAOAAC C«^CA TCGCA™:" 
CTATT^rrC GTGCTATT.3A OCAAAGGOAA GCCAAGGTAG <.rTTTr«:TG TCTCTTCTTT 
™.CTATAA a.ac«:ttaa C^^COGAOA AA<»^ "TCTTCCT. CGGOCOT^ 
.^^„AAA O^OAOCCCTX CC^ CC^-™ 

e^^AT CCOCCGCOCT ATCGACCTCT CO^CCTAa TCTTTCTrCT CCACTT^T 
C;«:CTCAC«C CAACO^CGAC tGCCA«CAA TCAACAATCC CGACCCCCOA CACACAA.TA 
..^^ACAAAT <»OCC^rr AGC^AAAAA G^TCCATO GAAATCCGAO ««»ATTGAT 
^.CGCOGTCA GTACGAOAGG AOOCGCT^TT CCTTrCAAAA GAAAGATXGA GCGAAAACAO 
^•rO^ ^OCCAT CAAOAC^AC GCAOACACaC TOC^TAT CCCATAC^C 
.O^TGATTC ACCCA^^TO A^OTCTm TACATTTCGA ATATAGCT«: ACATCCATTC 
^rrCC^. CACAOA^CT CGTAXCGGAA GOOATACAAC AOCTCXCG- CCA<^-- 

.«.«:gact gattcagg^g ccagaggtga tcgtccctt, <n^o at^^ 

™.:CGAXGA GG«AT«:GA TGC««AAAG ArrCAGAGA. GGAACG^ GTCATGA^G 
ATCGACTTCA AGTTAGTTCT ^CTTTC AAGACTCTTT GTOACA^^T GTCTTA«:CA 

rmrrcnc cttcttctgc agaacttggt ctccgagaac cacgcacacc 

.AGCAGCACT TGGCGTG«:C CACCCATCCC TCGAAGAGAT TA^C^- <-raCTGA- 

^.rrrcc^ gcttcgaaca aag^tgacag gcgccggtgg aogtg^ttgc gctgtaaccc 

^^CCGA TGGTAAAGTC TCTCCT««: TC.XCCGTCC AAGCGACACA TC^ACCGAT 
OCGCA«:C« TAC^^r CAACCAGACT TCTCGACTOA AACCCTOCAA GC«:TTAT0G 
TCAATCATCG TOCGCCCCTT ATATOGCCCG AGTGGG.<^ TCAGGCGTCG 
OATTCCTXTC ATCAACTAAG GCCGATCCGG AAGATGGGGA GAACAGACT. AAAGATGGGC 
OOAGATTGAT GAGCAGACA GA^-T GAAAACGG^ CG^^GTCTT 
^XrrGAAC GAAAGATAGG A«.CGGTGAC TAGG<«^G ATCCmGCT GTCATTTTTA 
CAAAACAcrr TCTTATGTCT tcatgac«:a ACGTATGCCC TCATCTCTAT CCATAGACAG 
^^C. CTCAGGTTTC AATACGTAAG CGTTCAXCGA CAAAACTGC GGCACACGAA 
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AACGAGTGGA TATAAGGGAG AAGAfiAGATA TTAGAGCGAA AAAGAGAAGA GTGAGAGAGG 
AAAAAAATAA CCGAGAACAA CTTATTCCGG TTTOTTAGAA TCGAAGATCG AGAAATATGA 
AGTACATAGT ATAAAGTAAA GAAGAGAGGT TTACCTCAGA GGTGTGTACG AAGGTGAGGA 
CAGCTAAGAG GAATAATTGA CTATCGAAAA AAGAGAACTC AACAGAAGCA CTGGGATAAA 
GCCTAGAATG TAAGTCTCAT CGGTCCGCGA TCAAAGAGAA ATTGAAGGAA GAAAAAGCCC 
CCAGTAAACA ATCCAACCAA CCTCTI^AC GATTGCGAAA CACACACACG CACGCGGACA 
TATTTCGTAC ACAAGGACGG GACATICTTT TTTTATATCC GGGTGGGGAG AGAGAGGGTT 
ATAGAGGATG AATAGCAAGG TTGATGTTTT GTAAAAGGTT GCAGAAAAAG GAAAGTGAGA 
GTAGGAACAT GCATTAAAAA CCTGCCCAAA GCGArTTATA TCGTTCTTCT GTTTTCACTT 
CTTTCCGGGC GCTTTCTTAG ACCGCGGTGG TCAAGGGTTA CTCCTGCCAA CTAGAAGAAG 
CAACATGAGT CAAGGATTAG ATCATCACGT GTCTCATTTG ACGGGrTGAA AGATATATTT 
AGATACTAAC TGCrTCCCAC GCCGACTGAA AAGATGAATT GAATCATGTC GAGTGGCAAC 
20 GAACGAAA6A ACAAATAGTA AGAATCAATT ACTAGAAAAG ACA6AATGAC TAGAA 

<2) INFORMATION FOR SEQ ID NO: 4: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2767 base pairs 
^ (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
30 Uii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 401.. 451 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 452.. 633 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 634.. 876 

45 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 877.. 1004 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1005., 1916 
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(A) NAME/KEY: polyA^site 

(B) LOCATION! 2217 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GAATTCTTCC CGACTGGGCT GATCGACTTG ACTGGAAGAT 
AAGTAATTGG AGGGAATGAG GAAAAAAAAA GGCGAGGGAA 
AGGCAATGTC GTGTATCTCT CTTGATTCTT TCGTTGTATC 
AATGAATATC ACTATCGCAT CCAATGATCG CTATACATGG 
CGCTCAGAAA GAGAGAACAT TCCTTTGGAA AAAGCCTACT 
GTTGATTAAA CGTCTTTCCC CATCCTAAGC AGACAAACAA 
CACCTCTCTC CGAAAAAGCT CTTCAATCCA GTCCATTAAG 
CTCGGCTCCC GTTAACATTG CGTGTATCAA GGTCCGTCTG 
TGCCTTGTGT GCGTTTGGTG GATCTGAAAG AACCCTTGCT 
CTTTTTCTTC CTGTCCTTTC CTTTTTCTCA CGACAAAAAA 
TCCTTTCCAT TGGTGTTCAT ACACCTAACA CAGTACTGGG 
ATTCTCCCTA CAAACTCCTC CTTGTCTGTC ACTCTCGACC 
ACGTCTTCTG CTTGTGACGC CTCGTTCGAG AAG6ATCGAC 
GAGGAGGTCA AGGCTGGTGG TCGGTTGGAT GTCTGCATCA 
GCGCAAGAGG AAGAGAAGGA TGCCGGTCTG GAGAAAGTGA 
TGTACTCTGT ATAGGTACCG TTGACAGGAC AGTCTTTCTG 
TTTTTGGGGG GGTGGTGGTG TTTGAAATAA TGACCAAAAT 
TGCACCTTGC GTCTTACAAC AACTTCCCGA CTGCCGCTGG 
GTCTAGCTGC GTTGGTCGCC TCGCTCGCCT CGCTCTACAA 
AACTCTCGCT CATCGCCCGA CAAGGTTCTG GTTCTGCCTG 
TCGTTGCTTG G6AACAGGGC AAGCTTTCCT CTGGAACCGA 
AGCCCAGGGA ACACTGGCCC TCACTCCACG CGCTGATCTG 
AGACGACGGC CTCGACGGCA GGCATGCAAA CCACGGTGAA 
ACCGAATCGA ACACGTCGTT CCAGCCCGGA TGGAGGCCAT 
AGGATTTCGA CTCGTTCGCA AAGATCACCA TGAAGGACTC 
GCCTCGATTC GGAACCCCCG ATCTTTTACT TGAACGATGT 
TCGTCACCGA GCTCAACAGA GTGTCCGTCC AGGCCGGCGG 
CGTTCGACGC CGGQCCGAAC GCGGTGATCT ACGCC6AGGA 
TCAGGTTAAT CGAGCGGTAC TTCCCGTTGG GAACGGCTTT 
ACACCGAAGG CGGTGATGCC CTGAGGGAAG GCTTTAACCA 
GGAAGGGAAG CGTCGCCCGG TTGATTCACA CCCGGATCGG 



CTAAGGCGGA 
CGCG6TCTTC 
GACGGACCAC 
CATTTACATA 
GTGCCTGAAG 
CTTCTTTTCG 
ATGGTTCATA 
CATTGTGAAT 
TGAACCATTC 
ACCACCTGGA 
GTAAAC6G6A 
AGGATCACCT 
TTTGGCTTAA 
AGGAGATGAA 
GTTTTTCTCC 
AAGAGTTTGG 
AAAGCTCTCA 
ACTTGCTTCC 
CCTCCCAACG 
CCGATCGCTC 
CTCGTTCGCT 
TGTAGTTTCC 
CACCTCGCCT 
CACCCAGGCG 
CAACCAGTTC 
CTCCCGATCG 
TCCCGTCCTT 
ATCGTCCATG 
CGAGAACCCG 
GAACGTCGCC 
TGATGGACCC 



GG6ATGAAGG 
TTTCCTGQCA 
ACTCTTTTCG 
TGCCA6ACAT 
TCAGGCTGAT 
TTCAACACAC 
TCGCTACTGC 
GCTGCTCGTT 
CATCTCTGCT 
CCCTTTGTGT 
TACCAAGTTG 
CCGATCGACG 
CGGGATCGAG 
GAAGCTTCGA 
TGTGTGCGTG 
ATCTTACTCT 
TCTTTCAACG 
TCCGCTTCCG 
AACGCATCCG 
TTCGGCGGGT 
GTTCAGGTCG 
GACGAGAAAA 
TTGCTCCAAC 
ATCCGGGCCA 
CACGCCGTCT 
ATCATCCATC 
GCCGCCTACA 
CCGGAGATCA 
TTCGGGGTTA 
CCGGTGTTCA 
AGGACGTATG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 



55 
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GCGAGGAfiGA GAGCCTGATC GGCGW«3ACG GTCTGCCAAA GCjrCGTCAAG OCrTAfiftCTA 

tw;ottgttt CTTCTAAATT tgagccttcc tcccgcctcc cttccacaag cataaaacaa 

AGGATAAACA AATGAATTAT CAAAATAACT ATAGGTOTT TCTTCTAAAT TTGAGCCTTC 
CTCCCGCCTC CCTTCCACAA GCATAAAACA AAGGATAAAC AAATfiAATTA TCAAAATAAA 
ATAAAAAOTC TCCCTTCTTT CTTTTGGAAT ACATCTTCTT TCGGACATGA CCCTTCTCCT 
TCTTTTCCGT ATACATCTTT TTGGGTArrr CATGGTGATC AAACAACATT GTGATCGAAA 
GCAGAGACGG CCATCffTGCT GGCTTTGAGC GTCTGGCGTT TTGTGTGTCC TGCACTTCAG 
CAACCCCAAG CTGACCGCTA GGAAAACTCA TTGATCT6AT TTATATCGXA CGATGAAAGA 
GAATAAAATG ATAGAAGAAC AAAGAAGAAC AAAGTAGAAG AACGTCTGAG AAGAAAGACA 
6GAAAATGAC ACGTACATAG TCTTCGATGA TCAATCATAT AATATTAAAT ATAAAATGAG 
GTAAACGTAT AGCATCACGG GAT6AACGGA TGAACATCTA GTGGftCAAOG TTGGGAAATA 
GGAATCTAGA ATCCAAGAAT CCTT6ACTGA TG6AC06ACG TATGTAAACA GGTACACCCC 
AAGAAAGAAA GAAAGAAAAC ACAAAGCCAA GGAAGTAAAG CAGATGGTCT 
TCTAAGAATA CGGCTTCAAA AAGACAGTGA ACACTCGTCG TCGAGGAATG ACAAGAAAAG 
TGAGAGACTA CGAAAGGAAG AAACCAAGAC GAAAAGAAGA ACGGAGATCG AACGGACAGA 
AATAAAG 



(2) INFORMATION FOR SEQ ID NO: 5: 

{i> SEQUENCE CHARACTERISTICS; 
30 (A) LENGTH: 4092 base paxrs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
^ (iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyxna 
40 (B) STRAIN: ATCC96594 

(ix) FEATURE: 

<A) NAME/KEY: exon 

(B) LOCATION: 852.-986 

45 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 987.. 1173 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LCXTATION: 1174.. 1317 



(ix) FEATURE: 

(A) NAME/KEY: intron 
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(B) LOCATION! 1318.. 1468 

(ix) FEATURE: 

(A) NAME/KEY: exon 
S (B) LOCATION: 1469.-1549 

(ix) FEATURE: 

(A) NAME/KEY: xntron 

(B) LOCATICW: 1550.. 1671 

to (ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1672.. 1794 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1795. .1890 

IS 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1891.. 1979 

(ix) FEATURE: 
20 (A) NAME/KEY: intron 

(B) LOCATION: 1980.. 2092 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2093.. 2165 

• ^ (ix) FEATURE: 

(A) SAME/KEY: intron 

(B) LOCATION: 2166.. 2250 

(ix) FEATURE: 

(A) NAME/KEY: exon 
30 (B) LOCATION: 2251.. 2391 

(ix) FEATURE: 

(A) NAME/KEY: a^^ron 

(B) LOCATION: 2392.. 2488 

(ix) FEATURE: 
35 (j^) NAME/KEY: exon 

(B) LOCATION: 2489.-2652 

(ix) FEATURE: 

(A) NAME/KEY: i^^^^ron 

(B) LOCATION: 2653.-2784 

40 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2785.. 2902 

(ix) FEATURE: 

(A) NAME/KEY: polyA^Slte 
^ (B) LOCATION: 3024 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
C«:CC0C^.T CT^OCCACXO XTOCCOCCGG .GTGTCTGOC GG^i- <^<^ 
.^CTCT C^GAGCAAfi CGTACCACAA GCTAGCTCTT CGTC^HOG A«;GACA«:C 
;^CCACCTTC CTGGCCTTCG GGGATGGCAC C^CTCGTCG ACTTCCCATG GCCGXGCCCC 



120 
180 
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IS 



20 



40 



45 



CTCTCATCGT GATCGTATTG ACCACATCAT GCGATGTTTG ACTTTCTCGT AGCX^CCACC 
TCATCGTTGT TTACAAGACC GCTTTCTATT CATTCTACCT TCCTGTCGCA CTCGCTATGC 



240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



TGGCCTTGTO AAGATACTGT TTGCCAAGCT GAGCGCX:TCC CCGCTGCTCC AGGTCCGCAA 
GGTCCGAGAG TArTGGACCT CGAAGATATG TTCAAAGTGT CAGGCGAGTT CTCGGGAGAA 
' AAAAAAAGCG TGSGCTCTGA AACAGTGTGG AAATGTCTAC AAAGTGAGCT GGATTTATTG 
TCTCTGTATG TCTGTGTGTG TCTATCTTCT GTCTTGGTTG CTCACTGTAC TCTATGCTCT 
CTCITAGATT TGGGGAACAG TCCTGTGAAC GCGTCGCGAA ACATGCTGCA CCTAGCCCTT 
10 CACCAGAAGG AGAACCAGAG GGCGGGAATG CTGGTGTCTG ACGCTGCTAC TGCTGCTAC6 
CTAQCCGCTG AGGCTGAGGC TCGCA6AAAC TAAATCCATG ACCCATCASA TCTTGGTGAT 
TCffTOGTCTG AGGACACCCA AGTCCAAAAG GGCTATATAT CGACCATCAT CCOTTQCGGT 
CACTCAGTAG TAACTAAAGC TATACATAGG AATGTTCTGA ACTTGATAAC CCTAACACTA 
CGAftAATATC TCGGAAAATA 6ATTAATTTC CrTCTCATCT CAAACAAAAG ACACAACACC 
ATCAATCACG CTCCTTTCAC ACACTCTCCT TTTTGCTCTC TCGTTCGACA GAAAATAACA 
TCAATAGCCA AATGTCCACT ACGCCTGAAG AGAAGAAAGC AQCTCGAGCA AAGTTC6AGG 
CTCICTTCCC GGTCATTQCC GATGAGATTC TC6ATTATAT GAAfiGGTGAA GGCATGCCTG 
CCGAGGCm GGAATGGATG AACAAGGTTC OXCAAGGGrf TCTrCTTTAT TCTTCTGGTC 
TTKyrrrCGG TCGAACTQGC TTTCGAACTT QGCCTTGACC GGTTGGATCT CGGITGrrGC 
^ GCCAAAAC6A TGTC6AAQCA AAACTTACTC TTACCTGTrC GGTTTCCTTC CTTCCGACCT 1140 
TCTCTCTACC CTTQCCTCCG ATCGCTCTTA TAGAACTTGT ACTACAACAC TCCCGGAGGA 1200 
AAACTCAACC GAGGACTTTC CGTGCTGGAT ACTTATATCC TTCTCTCGCC rrCTGGAAAA 1260 
GACATCTCGG AAGAAGACTA CTTGAAGGCC GCTATCCTCG GTTCGTGTAT C6AGCTTGTA 1320 
CGCGTTTTCT TCATTCACCT TrCTTTCTCG TCTTCTACTC TCTTCTCTCG AACTATCTTC 1380 
CCTGCGTffTC ATCCTACACG AATCTTTATA CTTACATGTT GGAACATATG CCCTCTTCTT 1440 
35 AATTCACCTC rrrTGTCTCG GATGGTAGCT CCAAGCTTAC TTCTTCGTGG CTGATGATAT 1500 
GATCGACGCC TCAATCACCC GACGAGGCCA ACCCTGTTGG TACAAAGTTC TTAGTCCCTT 1560 
CTTCTCTTTC TOTCCTCTTT CTKrTCAGCT ATCCCAATTC TT6ATTGAAA TCGGTOGTGC 1620 
CGTCCGGACT AATCCGTTTG TCGTTTTTAT CATATCTTCT TGCACAAACA GGAGGGA6T6 1680 
TCTAACATTG CCATCAACGA CGCGTTCATG CTCGftGGGAG CTATCTACTT TTTGCTCAAG 1740 
AAGCACTTCC GAAAGCAGAG CTACTATCTC GATC-TOCTAG AGCTCTECCA CGATGTTTCT 1800 
CTCTATITCT TTTCTTCCTC CCCTCAATAA ACTOTATTTG TGACCATTCT GGATCCTOTC I860 
CTGACGAT6A ATCATTCTTC OSATGAGTAG GTTACTTTCC AAACCGAGTT GGGACAGCTC 1920 
ATCGATCTGT TGACCGCTCC -TOAGGATCAC 6TCGATCTC6 ACAAGTTCTC CCTTAACAAG 1980 
TATGCCCGTC ATATATTCGT TTTGTTGCAT TCACGTCTGA TTGTCAGCTC CGATTATTGA 2040 
»,-<i«p<n~'prf:T AGOIACCACC 2100 



2160 
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10 



IS 



30 



35 



2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 



6AATCGTGGG TCTCaCTCTT CAACTCTTCT TCCTCATTTT CTTGACCATC TCTAACATAA 
ATCCTTGGAA TTTTGAACTC TAICTCATAG GTCGGCGTGA CAGATGAGGA GGCGTACAAG 
CTTGCGCTCT CGATCCTCAT CCCGA-TOGGT GAATACTTTC AAOTICAGGA TGATGTGCTC 
GACGCGTTCG CTCCTCCGGA GATCCTTGGA AAGATCGGAA CCGACATCTT GGTGCGTTTT 
CGTTCCTTCC TTCTACGTTC TCTTrTCTAT CTTCTGACTC CCCGTCCATC ArTTATGCTT 
CTGTTAAAAC GTATTCAAAC ATCAAAAGGA CAACAAATGT TCATGGCCTA TCAACCTTGC 

ACTCwrrcrc gcctcgcccg ctcagcgaga gattctcgat acttcgtacg gtcagaagaa 

CTCGGAGGCA GAGGCCAGAG TCAAGGCTCT GTACGCTGA6 CTTGATATCC AGGGAAAGTT 
CAACGCTTAT GAGTATOTCA TCTTTTTTAA ATTTTCTAAT TTTCTTTTCA TCTCTTGTTC 
CCAAGAATTA TrTTCTGAAA GTTCT«56AC TSAACATGGT GCATCCCTTT GGGTTCACTC 
CGCATATOTC TCCCOTTTGA ATAGGCAACA GAGTTACGAG TCGCTGAACA AOrTGATTGA 
C^GTATTGAC GAAGAGAAGA OTGGACTCAA GAAAGAACTC WCCACAGCT TCCTGGGTAA 

G(m:TATAAG CGAAOCAACT AAtTCTCCTC TTTATATGCA AAGGGAAGAT TTTGGCGGGA 2940 

GTGATAGGTA GGAAGAGAA6 OGAGGCTTCAT ATTCATTAGG CATTTCTCTT GCAGATATAG 3000 

A-TGATCAAAA AGGGATATCG GTCCTCTTCT TTGTTCCGAA TACATAATAA GTCATACGAA 3060 

GCCGAACATG ACAAAACTGG TTCATGAGAT CAAACTTTTr GCATGATCTT CTGCGATTTT 3120 

GTACAATTCT CTCGCATCCT ATTAGGATC6 AACCAGGAGA AGATSAGAGA AGGAAACCCT 3180 

CACCCCGTCA GATAACAAAC GAGAAGTCTC ATCACACACA CACACAGATG AAAGAGAAAA 3240 

ATAAACTGAC GAGGATAACT TCCAATCCGA TTTTTCCAGC CCACGAACCT TCCTIGGTCC 3300 

CCGCPCCGGT GCCTTCGAGT CCGATCAATG GGGCCCAAAC GCCTGAAGAT CCAAAGAACC 3360 

CTTGTTGAG6 TGTATTTCK CTCTGAGCAA TCTTAGATCC TTCAATTMC AGTCGCGCAT 3420 

ATATACCATC AACAICATCG TCATCACCAT CATTCTCCJIC CACAACAGCA CC6CAACGCC 3480 

OTTAATGGCA GGGCTT66AC AACTTGAGGC GtjrTTCTAOC AGGTCGGACC GATTGGAGCT 3540 

CGACCCA0G6 TGCACATCAC CAAGACACAT TCTCCTTCAA AT6AGCGAAC AAGACATAAT 3600 

OAGGGAAGTA CTACGCTAXC GAACGTCTK TCACATCCCG GGTrcrrGGC GTATCTTT«5 3660 

" GCGATTCTTT TTGTTGAAAT AGAAAATTGA AGAGAAAAAA AGAGATCCAC ATGATGAAGA 3720 

ACGGCTCTGT AGATICATGC TCGAAAGAAA GAAAGAAAGA AAAAGAGGGG AACGAACGGA 3780 

TCTGAATCTG TGGCCAACCA AAAAGTAGGC ACAAAGATGA CAACAGCGCC CTCTTCGACA 3840 

AGTCTTTGAA CTGCTTOTGG ATGAGACAAG TCCCAfiCAGA ICAACATTCC TGCTTTACCC 3900 

CATGGAGTAT CAAAC*CCTG AGAATAGGTC TTGCCCGGCT GTAGATAATC TCTGGACCGT 3960 

CATATGCGCG AAACGATCAG TACGACCGAC KTACTCGAA 6TCGTCAAGA GCACGGACGA 4020 

^ GAACGAAAAG AGGACAAACC GCTCTOGATG CCATAAATTT CTCTTCTCAT ACCICTCCCA 4080 

4092 

CCCACCCTCA GG 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 1091 amino acids 
(B) TYPE: amino acid 
(D> TOPOLOGY: linear 

Ui) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTICW: SEQ ID NO: 6: 

Met Tyr Thr He Lys His Ser Asn Phe Leu Ser Gin Thr He Ser Thr 
1 5 10 -La 

Gin ser Thr Thr Ser Trp Val Val Asp Ala Phe Phe Ser Leu Gly Ser 

20 25 30 

Arg Tyr Leu Asp Leu Ala Lys Gin Ala Asp Ser Ala Asp He Phe Met 
35 40 

val Leu L«u Gly Tyr Val Leu Met His Gly Tlir Phe Val Arg Leu Phe 
50 55 60 

Leu Asn Phe Arg Arg Met Gly Ala Asn Phe Trp Leu Pro Gly Met Val 
65 70 75 

Leu Val Ser Ser Ser Phe Ala Phe Leu Thr Ala Leu Leu Ala Ala Ser 
85 50 

30 lie Leu Asn Val Pro He Asp Pro He Cys Leu Ser Glu Ala Leu Pro 

100 110 

Phe Leu Val Leu Thr Val Gly Phe Asp Lys Asp Phe Thr Leu Ala Lys 
115 120 125 

ser Val Phe Ser Ser Pro Glu He Ala Pro Val Met Leu Arg Arg Lys 
^ 130 135 1*0 

Pro val He Gin Pro Gly Asp Asp Asp Asp Leu Glu Gin Asp Glu His 
145 150 155 1^" 



10 
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ser Arg Val Ala Ala Asn Lys Val Asp He Gin Trp Ala Pro Pro Val 
165 170 1'5 

Ala Ala Ser Arg He Val He Gly Ser Val Glu Lys He Gly Ser Ser 
180 185 1^" 

He val Arg Asp Phe Ala Leu Glu Val Ala Val Leu Leu Leu Gly Ala 
195 200 205 

Ala ser Gly Leu Gly Gly Leu Lys Glu Phe Cys Lys Leu Ala Ala Leu 
210 215 220 

lie Leu Val Ala Asp Cys Cys Phe Thr Phe Thr Phe Tyr Val Ala He 
225 230 235 240 

Leu Thr val Met Val Glu Val His Arg He Lys He He Arg Gly Phe 
245 250 255 
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Arg Pro Ala His Asn Asn Arg Thr Pro Asn Thr Val Pro Ser Thr Pro 
260 265 270 

Thr He Asp Gly Gin Ser Thr Asn Arg Ser Gly He Ser Ser Gly Pro 
275 280 285 

Pro Ala Arg Pro Thr Val Pro Val Trp Lys Lys Val Trp Arg Lys Leu 

290 295 300 

Met Gly Pro Glu He Asp Trp Ala Ser Glu Ala Glu Ala Arg Asn Pro 
305 310 315 320 

Val Pro Lys Leu Lys Leu Leu Leu He Leu Ala Phe Leu He Leu His 
325 330 335 

He Leu Asn Leu Cys Thr Pro Leu Thr Glu Thr Thr Ala He Lys Arg 
340 345 350 

Ser Ser Ser He His Gin Pro He Tyr Ala Asp Pro Ala His Pro He 

355 360 365 

Ala Gin Thr Asn Thr Thr Leu His Arg Ala His Ser Leu Val He Phe 

370 375 380 

Asp Gin Phe Leu Ser Asp Trp Thr Thr He Val Gly Asp Pro He Met 
385 390 395 400 

Ser Lys Trp He He He Thr Leu Gly Val Ser He Leu Leu Asn Gly 
405 410 415 

Phe Leu Leu Lys Gly He Ala Ser Gly Ser Ala Leu Gly Pro Gly Arg 
420 425 430 

Ala Gly Gly Gly Gly Ala Ala Ala Ala Ala Ala Val Leu Leu Gly Ala 
435 440 445 

Trp Glu He val Asp Trp Asn Asn Glu Thr Glu Thr Ser Thr Asn Thr 
450 455 460 

Pro Ala Gly Pro Pro Gly His Lys Asn Gin Asn Val Asn Leu Arg Leu 
465 470 475 480 

Ser Leu Glu Arg Asp Thr Gly Leu Leu Arg Tyr Gin Arg Glu Gin Ala 
485 490 495 

Tyr Gin Ala Gin Ser Gin He Leu Ala Pro He Ser Pro Val Ser Val 
500 505 510 

Ala Pro Val Val Ser Asn Gly Asn Gly Asn Ala Ser Lys Ser He Glu 
515 520 525 

Lys Pro Met Pro Arg Leu Val Val Pro Asn Gly Pro Arg Ser Leu Pro 
530 535 540 

Glu Ser Pro Pro Ser Thr Thr Glu Ser Thr Pro Val Asn Lys Val He 
545 550 555 560 

He Gly Gly Pro Ser Asp Arg Pro Ala Leu Asp Gly Leu Ala Asn Gly 
565 570 575 

Asn Gly Ala Val Pro Leu Asp Lys Gin Thr Val Leu Gly Met Arg Ser 
580 585 590 

He Glu Glu Cys Glu Glu He Met Lys Ser Gly Leu Gly Pro Tyr Ser 



595 



600 



605 
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Leu Asn Asp Glu Glu Leu He Leu Leu Thr Gin Lys Gly Lys He Pro 
610 615 620 

Pro Tyr Ser Leu Glu Lys Ala Leu Gin Asn Cys Glu Arg Ala Val Lys 
625 630 635 

He Arg Arg Ala Val lie Ser Arg Ala Ser Val Thr Lys Thr Leu Glu 



645 



650 



Thr ser Asp Leu Pro Met Lys Asp Tyr Asp Tyr Ser Lys Val Met Gly 



660 665 
Glu 

Ala Gly Pro Leu Asn He Asp Gly Glu Val Val Pro He Pro Met Ala 



Ala cys Cys Glu Asn Val Val Gly Tyr Met Pro Leu Pro Val Gly He 

680 ^^"^ 



675 



690 



695 



Thr Thr Glu Gly Thr Leu Val Ala Ser Thr Ser Arg Gly Cys Lys Ala 



705 



710 



715 



Leu Asn Ala Gly Gly Gly Val Thr Thr Val He Thr Gin Asp Ala Met 

725 730 '-^^ 

Thr Arg Gly Pro Val Val Asp Phe Pro Ser Val Ser Gin Ala Ala Gin 



740 



745 



Ala Lys Arg Trp Leu Asp Ser Val Glu Gly Met Glu Val Met Ala Ala 



755 



760 



Ser Phe Asn Ser Thr Ser Arg Phe Ala Arg Leu Gin Ser He Lys Cys 
770 775 780 

Gly Met Ala Gly Arg Ser Leu Tyr He Arg Leu Ala Thr Ser Thr Gly 



785 



790 



Asp Ala Met Gly Met Asn Met Ala Gly Lys Gly Thr Glu Lys Ala Leu 



805 



Glu Thr Leu Ser Glu lyr Phe Pro Ser Met Gin He Leu Ala Leu Ser 



820 



825 



Gly Asn Tyr Cys He Asp Lys Lys Pro Ser Ala He Asn Trp He Glu 

840 845 



835 



Gly 



Arg Gly Lys Ser Val Val Ala Glu Ser Val He Pro Gly Ala He 

— — - 860 



850 



855 



val Lys ser Val Leu Lys Thr Thr Val Ala Asp Leu Val Asn Leu Asn 
865 870 875 88U 

He Lys Lys Asn Leu He Gly Ser Ala Met Ala Gly Ser He Gly Gly 
885 890 

Phe Asn Ala His Ala Ser Asp He Leu Thr Ser He Phe Leu Ala Thr 



900 



905 



Gly Gin ASP Pro Ala Gin Asn Val Glu Ser Ser Met Cys Met Thr Leu 



915 



920 



Met Glu Ala Val Asn Asp Gly Lys Asp Leu Leu He Thr Cys Ser Met 
930 935 940 

Pro Ala He Glu Cys Gly Thr Val Gly Gly Gly Thr Phe Leu Pro Pro 



945 



950 



955 
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Gin Asn Ala Cys Leu Gin Met Leu Gly Val Ala Gly Ala His Pro Asp 
Ser pro Gly His Asn Ala Arg Arg Leu Ala Arg He He Ala Ala Ser 



980 



val Met Ala Gly Glu Leu Ser Leu Met Ser Ala Leu Ala Ala Gly His 
995 1000 1005 

Leu He Lys Ala His Met Lys His Asn Arg Ser Thr Pro Ser Thr Pro 
1010 1015 1020 

Leu pro Val Ser Pro Leu Ala Thr Arg Pro Asn Thr Pro Ser His Arg^ 
1025 1030 1035 

Ser He Gly Leu Leu Thr Pro Met Thr Ser Ser Ala Ser val Ala Ser 
1045 1050 

Met Phe Ser Gly Phe Gly Ser Pro Ser Thr Ser Ser Leu Lys Thr Val 
1060 1065 i"'W 

Gly ser Met Ala Cys Val Arg Glu Arg Gly Asp Glu Thr Ser Val Asn 
1075 1080 108S> 

Val Asp Ala 
1090 

(2) INFORMATION FOR SEQ ID NO: "7: 

^ (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 467 amino acids 

(B) TYPE: aiDino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

Uii) HYPOTHETICAL; NO 

(Vi) ORIGINAL SOURCE: 

<A) ORGANISM: Phaffia rhodozyma 
(B) STRAIN: ATCC96594 

^ <xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Tyr Thr Ser Thr Thr Glu Gin Arg Pro Lys Asp Val Gly He Leu 
1 5 10 

Gly Met Glu He Tyr Phe Pro Arg Arg Ala He Ala His Lys Asp Leu 
40 20 25 3U 

Glu Ala Phe Asp Gly Val Pro Ser Gly Lys Tyr Thr He Gly Leu Gly 
35 40 45 

Asn Asn Phe Met Ala Phe Thr Asp Asp Thr Glu Asp He Asn Ser Phe 
50 55 60 



Ala Leu Asn Ala Val Ser Gly Leu Leu Ser Lys Tyr Asn Val Asp Pro 
65 70 75 «0 

Lys Ser He Gly Arg He Asp Val Gly Thr Glu Ser He He Asp Lys 

90 



85 



Ser Lys Ser Val Lys Thr Val Leu Met Asp Leu Phe Glu Ser His Gly 
100 105 110 
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Asn Thr Asp lie Glu Gly lie Asp Ser Lys Asn Ala Cys Tyr Gly Ser 
115 120 125 

Thr Ala Ala Leu Phe Asn Ala Val Asn Tzp He Glu Ser Ser Ser Trp 
130 135 140 

Asp Gly Arg Asn Ala He Val Phe Cys Gly Asp He Ala He Tyr Ala 

145 150 155 160 



10 



Glu Gly Ala Ala Arg Pro Ala Gly Gly Ala Gly Ala Cys Ala He Leu 
165 170 175 



75 



20 



He Gly Pro Asp Ala Pro Val Val Phe Glu Pro Val His Gly Asn Phe 
180 185 190 

Met Thr Asn Ala Trp Asp Phe Tyr Lys Pro Asn Leu Ser Ser Glu Tyr 
195 200 205 

Pro He Val Asp Gly Pro Leu Ser Val Thr Ser Tyr Val Asn Ala He 
210 215 220 

Asp Lys Ala Tyr Glu Ala Tyr Arg Thr Lys Tyr Ala Lys Arg Phe Gly 

225 230 235 240 

Gly Pro Lys Thr Asn Gly Val Thr Asn Gly His Thr Glu Val Ala Gly 
245 250 255 



25 



Val Ser Ala Ala Ser Phe Asp Tyr Leu Leu Phe His Ser Pro Tyr Gly 
260 265 270 

Lys Gin Val Val Lys Gly His Gly Arg Leu Leu Tyr Asn Asp Phe Arg 
275 280 285 



30 



Asn Asn Pro Asn Asp Pro Val Phe Ala Glu Val Pro Ala Glu Leu Ala 
290 295 300 

Thr Leu Asp Met Lys Lys Ser Leu Ser Asp Lys Asn Val Glu Lys Ser 

305 310 315 320 



Leu He Ala Ala Ser Lys Ser Ser Phe Asn Lys Gin Val Glu Pro Gly 
325 330 335 



35 



Met Thr Thr Val Arg Gin Leu Gly Asn Leu Tyr Thr Ala Ser Leu Phe 
340 345 350 



Gly Ala Leu Ala Ser Leu Phe Ser Asn Val Pro Gly Asp Glu Leu Val 
355 360 365 



40 



Gly Lys Arg He Ala Leu Tyr Ala Tyr Gly Ser Gly Ala Ala Ala Ser 
370 375 380 



Phe Tyr Ala Leu Lys Val Lys Ser Ser Thr Ala Phe He Ser Glu Lys 

385 390 395 400 



45 



Leu Asp Leu Asn Asn Arg Leu Ser Asn Met Lys He Val Pro Cys Asp 
405 410 415 



SO 



Asp Phe Val Lys Ala Leu Lys Val Arg Glu Glu Thr His Asn Ala Val 
420 425 430 

Ser Tyr Ser Pro He Gly Ser Leu Asp Asp Leu Trp Pro Gly Ser Tyr 
435 440 445 

Tyr Leu Gly Glu He Asp Ser Met Trp Arg Arg Gin Tyr Lys Gin Val 
450 455 460 
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Pro Ser Ala 
465 

(2) INFORMATION FOR SEQ ID NO: 8: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 432 amino acids 

(B) TYPE; amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

Lys Glu Glu He Leu Val Ser Ala Pro Gly Lys Val He Leu Phe Gly 

L His Ala val Gly His Gly Val Thr Gly He Ala Ala Ser Val Asp 

20 25 
X.U AT, Cys Tyr Ala Leu Leu ser Pro thr Ala Thr Thr Thr Thr ser 

ser ser ^ Ser Ser As. Xle He Ser Leu Thr Asp Leu Asn 

Phe Thr Gin ser xrp Pro Val Asp Ser Leu Pro Trp Ser Leu Ala Pro 
65 ■'° 

TTp Thr Glu Al. ser He Pro Glu Ser Leu Cys Pro Thr Leu Leu 

30 85 

Tio Ala Gly Gin Gly Gly Asn Gly Gly Glu Arg 
Ala Glu He Glu Arg He Aia t.xy « jr ^ 



100 



CX« x,y, val Ala Thr Met Ala Phe Leu Tyr Leu Leu Val Leu Ser 
115 

Lys Gly Ly, Pro Ser Glu Pro Phe Glu X.u Thr Ala Ar, Ser Ala Leu 

130 

Pro Met Gly Ala Gly Leu Gly Ser Ser Ala Ala Leu Ser Thr Ser Leu 
145 

Ala Leu val Phe Leu Leu His Phe Ser His Leu Ser Pro Thr Thr Thr 

165 

Gly Arg Glu Ser Thr He Pro Thr Ala Asp Thr Glu Val lie Asp Lys 

180 

xrp Ala Phe Leu Ala Glu Lys val He His Gly Asn Pro Ser Gly He 

195 200 
ASP Asn Ala val Ser Thr Ar| Gly Gly Ala Val Ala Phe Lys Ar« Lys 

210 215 
lie Glu Gly Lys Gin Glu Gly Gly Met Glu Ala He Lys Ser Phe Thr 
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70 



IS 



20 



25 



30 



35 



40 



AS 



SO 



55 



Ser lie Arg Phe Leu lie Thr A-P Ser Ar, He Gly Ar« Asp Thr Arg 

245 * 
ser Leu Val Ala Gly Val A,n Ala Arg Leu He Gin Glu Pro Glu Val 
260 

val pro Leu Leu Glu Ala lie Gin Gin He Ala Asp Glu Ala lie 
275 280 
Arg Cys Leu Lys Asp Ser Glu Met Glu Arg Ala Val Met He Asp Arg 

290 295 
I^u Gin Asn Leu Val Ser Glu Asn His Ala His Leu Ala Ala Leu Gly 



305 



310 



val ser His Pro Ser Leu Glu Glu He He Arg He Gly Ala Asp Lys 



325 



pro Phe Glu l.u Arg Thr Lys Leu Thr Gly Al- Gly Gly Gly Gly Cys 

340 

Ala val Thr Leu Val Pro Asp Asp Phe Ser Thr Glu Thr Leu Gin Ala 



355 



Met Glu Thr Leu Val Gin Ser Ser Phe Ala Pro Tyr He Ala Arg 

370 

val Gly Gly Ser- Gly Val Gly Phe Leu Ser Ser Thr Lys Ala Asp Pro 

390 



385 



Glu ASP Gly Glu Asn Arg Leu Lys Asp Gly Leu Val Gly Thr Glu He 



405 



.sp Glu Leu ASP Arg Trp Ala Lys Thr Gly Arg Trp Ser Phe Ala 



420 



(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 401 amino acias 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOtECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rtiodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

Met val His He Ala Thr Ala Ser Ala Pro Val Asn He Ala Cys He 

Lys ivr Trp Gly Lys Arg Asp Thr Lys Leu He Leu Pro Thr Asn Ser 
20 

ser Leu Ser Val Thr Leu Asp Gin Asp His Leu Arg Ser Thr Thr Ser 

35 

ser Ala Cys Asp Ala Ser Phe Glu Lys Asp Arg Leu Trp Leu Asn Gly 



50 55 
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He Glu Glu Glu Val Lys Ala Gly Gly Arg Leu Asp Val Cys He Lys 
65 70 75 80 

Glu Met Lys Lys Leu Arg Ala Gin Glu Glu Glu Lys Asp Ala Gly Leu 
5 g5 90 95 

Glu Lys Leu Ser Ser Phe Asn Val His Leu Ala Ser Tyr Asn Asn Phe 

100 105 110 

Pro Thr Ala Ala Gly Leu Ala Ser Ser Ala Ser Gly Leu Ala Ala Leu 
10 115 120 125 

Val Ala Ser Leu Ala Ser Leu Tyr Asn Leu Pro Thr Asn Ala Ser Glu 
130 135 140 



15 



20 



25 



30 



Leu Ser Leu He Ala Arg Gin Gly Ser Gly Ser Ala Cys Arg Ser Leu 
145 150 155 160 

Phe Gly Gly Phe Val Ala Trp Glu Gin Gly Lys Leu Ser Ser Gly Thr 
165 170 175 

ASP Ser Phe Ala Val Gin Val Glu Pro Arg Glu His Trp Pro Ser Leu 
180 185 190 

His Ala Leu He Cys Val Val Ser Asp Glu Lys Lys Thr Thr Ala Ser 
195 200 205 

Thr Ala Gly Met Gin Thr Thr Val Asn Thr Ser Pro Leu Leu Gin His 

210 215 220 

Arg He Glu His Val Val Pro Ala Arg Met Glu Ala He Thr Gin Ala 
225 230 235 240 

He Arg Ala Lys Asp Phe Asp Ser Phe Ala Lys He Thr Met Lys Asp 
245 250 255 

Ser Asn Gin Phe His Ala Val Cys Leu Asp Ser Glu Pro Pro He Phe 
260 265 270 

Tyr Leu Asn Asp Val Ser Arg Ser He He His Leu Val Thr Glu Leu 
275 280 285 

Asn Arg Val Ser Val Gin Ala Gly Gly Pro Val Leu Ala Ala Tyr Thr 

290 295 300 

Phe Asp Ala Gly Pro Asn Ala Val He Tyr Ala Glu Glu Ser Ser Met 
305 310 315 320 

Pro Glu He He Arg Leu He Glu Arg Tyr Phe Pro Leu Gly Thr Ala 
325 330 335 

Phe Glu Asn Pro Phe Gly Val Asn Thr Glu Gly Gly Asp Ala Leu Arg 
340 345 350 

Glu Gly Phe Asn Gin Asn Val Ala Pro Val Phe Arg Lys Gly Ser Val 
45 355 360 365 

Ala Arg Leu He His Thr Arg He Gly Asp Gly Pro Arg Thr Tyr Gly 

370 375 380 



40 



50 



55 



Glu Glu Glu Ser Leu He Gly Glu Asp Gly Leu Pro Lys Val Val Lys 
385 390 395 400 

Ala 
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INFORMATION FOR SEQ ID NO: 10: 

{i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyina 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Ser Thr Thr Pro Glu Olu Lys Lys Ala Ala Ar. Ala Lys Pne Glu 

L val PKe pro Val He Ala Asp Glu He Leu Asp Tyr Met Lys Gly 

20 

OXu Gly Met pro .la Glu ;aa Glu Trp Met .sn .ys .sn .eu ^ 

^ 1 pro Gly Gly Lys 1^ .sn Gly ^u Ser Val Val ^.P 

50 

.nr Tyr He U« l^u Ser Pro Ser Gly Lys .sp Xle Ser Glu Glu Glu 

t Leu Ly. Ala Ala lie Leu Gly Txp Cys Xle Glu Leu Leu Gin Ala 

^ PKe Leu val l Asp Asp Het Met ^ Ala Ser He Thr Ar. Ar. 
' 100 ■^^^ 

OXy Gin Pro Cys Trp Tyr Lys Val Glu Gly V.l Ser Asn He Ala He 

115 

^ Asn Ala Phe Met Leu Glu Gly Ala He Tyr Phe Leu Leu Lys Lys 
130 

His Phe AT, Ly, Gin Ser Tyr Tyr Val Asp Leu Leu Glu Leu Phe His 
1*5 

val Thr Phe Gin Thr Glu I.U Gly Gin Leu He A-p Leu Leu Thr 

pro Glu ASP His Val Asp Leu Asp Lys Phe Ser Leu Asn Lys His 
180 

„.» He val val Tyr Lys Thr Ala Phe Tyr Ser Phe Tyr Leu Pro 
195 

V.1 Ala Leu Ala Met Ar« Met Val Gly Val Thr Asp Glu Glu Ala Tyr 

210 

.vs Leu Ala Leu Ser Xle Leu Xle Pro Met Gly Glu Tyr Phe Gin Val 

225 

Oln ASP ASP val L.U Asp Ala Phe Ar. Pro Pro Glu Xle Leu Gly Lys 
245 
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He Gly Thr Asp He Leu Asp Asn Lys Cys Ser Trp Pro He Asn Leu 
260 265 270 

Ala Leu Ser Pro Ala Ser Pro Ala Gin Arg Glu He Leu Asp Thr Ser 
^ 275 280 285 

Tyr Gly Gin Lys Asn Ser Glu Ala Glu Ala Arg Val Lys Ala Leu Tyr 
290 295 300 

Ala Glu Leu Asp He Gin Gly Lys Phe Asn Ala Tyr Glu Gin Gin Ser 
10 305 310 315 320 

Tyr Glu Ser Leu Asn Lys Leu He Asp Ser He Asp Glu Glu Lys Ser 

325 330 335 



IS 



20 



25 



35 



Gly Leu Lys Lys Glu Val Phe His Ser Phe Leu Gly Lys val Tyr Lys 
340 345 350 

Arg Ser Lys 
355 

(2) INFORMATION FOR SBQ ID HO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
{XV) ANTI -SENSE: NO 
30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GGNAARTAYA CNATHGGNYT NGGNCA 26 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

^ (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
45 (Xi) SEQUENCE DESCRIPTION: SBQ ID N0:12: 

TANARNSWNS ViNGTRTACAT RTTNCC 26 

(2) INFORMATION FOR SEQ ID NO: 13: 

S<> (i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

55 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:13: 
GAAGAACCCC ATCAAAAGCC TCGA 

(2) INFORKATIOM FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

<xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
AAAAGCCTCG AGATCCTTGT GAGCG 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

tii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:15: 
AGAAGCCAGA AGAGAAAA 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base paxrs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
Uv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
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TCGTCGAGGA AAGTAGAT 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GGTACCATAT GTATCCTTCT ACTACCGAAC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GCATGCGGAT CCTCAAGCAG AAGG6ACCTG 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 19 
GCNTGYTGYG ARAAYGTNAT HGGNTAYATG CC 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 20: 
ATCCARTTDA TNGCNGCNGG YTTYTTRTCN GT 



<2) INF0PMAT1C»I FOR SEO ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
GGCCATTCCA CACTTGATGC TCTGC 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) AOTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22; 
GGCCGATATC TTTATG6TCC T 



(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGTACCGAAG AAATTATGAA GAGTGG 



(2) INPOPMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

iii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv> ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CTGCAGTCAG GCATCCACGT TCACAC 



(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GCNCCNGGNA ARGTNATHYT NTTYGGNGA 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
CCCCANGTNS WNACNGCRTT RTCNACNCC 



(2) INFORMATION FOR SEQ ID N0:27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 17 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

^ (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
<iv) ANTI-SENSE: NO 
70 (Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ACATGCTGTA GTCCATG 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: WIA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

actcg<;attc CATGGA 



25 



(2) INFORMATION FOR SEQ ID NO: 29: 

30 (i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



35 



40 



45 



SO 



55 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORC^ANISM: Phaffia rhodozyxna 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
TTGTT(5TCGT AGCAGTG<3GT (3A<3AG 

(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS:- 

(A) LENGTH: 18 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
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(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GGAAGAGGAA 6AGAAAAG 



(2) INFOKMATION FOR SEQ ID NO: 31: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBIMESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TTGCCGAACT CAATGTAG 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOXJRCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTI(3N: SEQ ID NO: 32: 
GGATCCATGA GAGCCCAAAA AGAAGA 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:33 
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GTCGACTCAA GCAAAAGACC AACGAC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
HTNAARTAYT TGGGNAARMG NGA 



(2) INFORMATICS FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEWJESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCR1PTI<»I: SEQ ID NO: 35: 
GCRTTNGGNC CNGCRTCRAA NGTRTANGC 



(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:3G: 
CCGAACTCTC GCTCATCGCC 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUEJSCE DESCRIPTION-. SEQ ID NO: 37: 
CAGATCAGCG CGTGGAGT6A 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CARGCNTAYT TYYTNGTNGC NGAYCy^ 

(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOtW: linear 

(ii) MOLBCrULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CAYTTRTTRT CYTGDATRTC NGTNCCDATY TT 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
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ATCCTCATCC CGATGGGTGA ATACT 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genotftic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 
AGGAGCGGTC AACAGATCGA TGAGC 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GAATTCATAT GTCCACTAC6 CCTGA 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
GTCGACGGTA CCTATCACTC CCGCC 
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Claims 

1. An isolated DNA sequence, which codes for an enzyme involved in the mevalonate pathway or the pathway from 
isopentenyl pyrophosphate to farnesyl pyrophosphate. 

5 

2. An isolated DNA sequence according to claim 1 . wherein said enzyme has an activity selected from the group con- 
sisting of 3-hydrQxy-3-methylglutaryl-CoA synthase activity, 3-hydroxy-3-methyiglutaryyl-CoA reductase activity, 
mevalonate kinase activity, mevalonate pyrophosphate decarboxylase activity and fernesyl pyrophosphate syn- 
thase activity. 

10 

3. An isolated DNA sequence according to claim 1 or 2, which is characterized in that 

(a) the said DNA sequence codes for the said enzyme having an amino acid sequence selected from the group 
consisting of those described in SEQ ID NOs: 6. 7. 8, 9 and 10. or 
IS (b) the said DNA sequence codes for a variant of the said enzyme selected from (i) an allelic variant, and (ii) 

an enzyme having one or more amino acid addition, insertion, deletion and/or substitution and having the 
stated enzyme activity. 

4. An isolated DNA sequence according to any one of claims 1-3. which can be derived from a gene of Phaffia 
20 rhodozyma and is selected from: 

(i) a DNA sequence represented in SEQ ID NOs: 1 . 2. 4 or 5: 

(ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NOs: 1 , 2, 4 or 5; and 

(iii) a derivative of a DNA sequence represented in SEQ ID NOs: 1. 2, 4 or 5 with addition, insertion, deletion 
25 and/or substitution of one or more nucleotide(s), and coding for a polypeptide having the said enzyme activity. 

5. An isolated DNA sequence, which is selected from: 

(i) a DNA sequence represented in SEQ ID NO: 3; 
30 (ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NO: 3: and 

(iii) a derivative of a DNA sequence represented in SEQ ID NO: 3 with addition, insertion, deletion and/or sub- 
stitution of one or more nucleotide(s). and coding for a polypeptide having the mevalonate kinase activity 

6. An isolated DNA sequence as claimed in claim 1 or 2 and which is selected from: 

35 

(i) a DNA sequence which hybridizes under standard conditions with a sequence as shown in SEQ ID Nos: 1 
- 10 or its complementary strand or fragments thereof; and 

(Ii) a DNA sequence which do not hybridize as delined in (i) because of the degeneration of the genetic code 
but which codes for potypeptives having exactly the same amino add sequence as shown in SEQ ID Nos: 1 - 
40 1 0 or those encoded by a DNA sequence as defined above under (i). 

7. A vector or plasnwl comprising a DNA sequence as defined in any of claims 1 -6. 

8. A host cell which has been transformed or transfected by a DNA sequence as claimed in anyone of claims 1 to 6. 
45 or a vector or plasmid as claimed in claim 7. 

9. A process for producing an enzyme involed in the mevalonate pathway or the pathway from isopentenyl pyrophos- 
phate to farnesyl pyrophosphate, which comprises culturing a host cell as claimed in claim 8, under the conditions 
conductive to the production of said enzyme. 

so 

10. A process for the production of isoprenoids or carotenoids. preferably astaxanthin, which comprises cultivating a 
host cell as claimed In claim 8 under suitable culture conditions. 
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(MEVALONATE 
PATHWAY) 



acctyl-CoA 

I ^4 acetyl-CoA 

acctoacxctyl-CoA 

hmc I < acetyl-CoA 

3-hydroxymcthyl-3-glutaryl-CoA 

hmg| 

mevalonic acid 
mvk I 

mevalonate-phosphate 

i 

mevalonate-pyrophosphate 
mpdj 

isopcntcnyl-pyrophosphatc (IPP) 

dimcthyhryl-pyrophosphatc 



(ISOPRENE 
BIOSYNTHESIS) 



gcranyl-pyrophosphatc (GPP) 

- I ► ubiquinone 

fps I 

farncsyl-pyrophosphate (FPP) cholesterol 

I (ergosterol) 

: getanylgeranyl-pyrophosphate (GGPI^ 

I ■»«• vitaminK, 

phytoenc 
(CAROTENOGENIC I 
PATHWAY) jy^^p/,, 

p-carotene 

i. 

astaxanthin 



vitaminE, 
chlorophyll 



Fig- 1 Biosynthetic pathway from acetyl-CoA 

to astaxanthin in P. rhodozyma 
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